Summary
Background
Major depressive disorder (MDD) is a leading cause of disability, with a twofold increase in prevalence in women compared to men. Over the last few years, identifying molecular biomarkers of MDD has proven challenging, reflecting interactions among multiple environmental and genetic factors. Recently, epigenetic processes have been proposed as mediators of such interactions, with the potential for biomarker development.
Methods
We characterised gene expression and two mechanisms of epigenomic regulation, DNA methylation (DNAm) and microRNAs (miRNAs), in blood samples from a cohort of individuals with MDD and healthy controls (n = 169). Case-control comparisons were conducted for each omic layer. We also defined gene coexpression networks, followed by step-by-step annotations across omic layers. Third, we implemented an advanced multiomic integration strategy, with covariate correction and feature selection embedded in a cross-validation procedure. Performance of MDD prediction was systematically compared across 6 methods for dimensionality reduction, and for every combination of 1, 2 or 3 types of molecular data. Feature stability was further assessed by bootstrapping.
Findings
Results showed that molecular and coexpression changes associated with MDD were highly sex-specific and that the performance of MDD prediction was greater when the female and male cohorts were analysed separately, rather than combined. Importantly, they also demonstrated that performance progressively increased with the number of molecular datasets considered.
Interpretation
Informational gain from multiomic integration had already been documented in other medical fields. Our results pave the way toward similar advances in molecular psychiatry, and have practical implications for developing clinically useful MDD biomarkers.
Funding
This work was supported by the Centre National de la Recherche Scientifique (contract UPR3212), the University of Strasbourg, the Université Sorbonne Paris Nord, the Université Paris Cité, the Fondation de France (FdF N° Engt:00081244 and 00148126; ECI, IY, RB, PEL), the French National Research Agency (ANR-18-CE37-0002, BE, CMC, ADD, PEL, ECI; ANR-18-CE17-0009, ADD; ANR-19-CE37-0010, PEL; ANR-21-RHUS-009, ADD, BE, CMC, CCB; ANR-22-PESN-0013, ADD), the Fondation pour la Recherche sur le Cerveau (FRC 2019, PEL), Fondation de France (2018, BE, CMC, ADD) and American Foundation for Suicide Prevention (AFSP YIG-1-102-19; PEL).
Keywords: Depression, Transcriptomic, microRNA, DNA methylation, Sex differences, Multiomic integration
Research in context.
Evidence before this study
Developing molecular biomarkers to improve the diagnosis and management of complex psychiatric disorders, such as major depression, is a crucial endeavour. However, recent studies that examined easily accessible peripheral tissues, such as blood, have revealed limited performance when attempting to predict depression. Essentially, there is currently no validated biomarker to complement clinical evaluation by a psychiatrist. Previous efforts have encountered significant challenges, likely reflecting the large heterogeneity in genetic, epigenetic and environmental factors that contribute to depression in the general population. Interestingly, efforts in other medical fields have suggested that the analysis of multiple molecular layers may yield improvements.
Added value of this study
Here, we aimed at testing the hypothesis that simultaneous investigation of multiomic biomarkers from the same individual may help improve the prediction of depression. To do so, we applied 3 genome-wide sequencing and microarray methodologies to generate expression profiles for protein-coding genes and microRNAs, and to assess DNA methylation levels, in peripheral blood samples from a naturalistic cohort of patients with depression and healthy controls. We then implemented a multiomic integrative framework that encapsulated correction for clinical covariates and feature selection within a cross-validation procedure. Our results indicate that the performance of depression prediction gradually increased when an increasing number of molecular layers were considered. In addition, we found that performance also significantly improved when men and women were analysed separately. This adds significant value to the literature by indicating that combining several types of molecular measures has the potential to improve the diagnosis of mental disorders.
Implications of all the available evidence
While the added value of multiomic integration had already been illustrated for other types of medical conditions, our study paves the way toward similar advances in molecular psychiatry, and has practical implications for the development of clinically useful biomarkers of depression.
Introduction
Major depressive disorder (MDD) is a systemic disease defined by clinically significant changes in mood, cognition, sleep, and appetite. It is widespread, with a life-time prevalence ranging from 2 to 21%,1 and highest rates in the United States and some European countries.2,3 Starting at puberty, the female:male ratio is approximately 2 to 1,4 suggesting significant sex differences. While multiple therapeutic strategies are available, their efficacy is unsatisfactory,5 with more than 30% of patients who exhibit resistance to antidepressant medication. MDD is also recurrent and associated with comorbidities, overall representing a significant public health burden.6 In this context, developing molecular biomarkers to improve diagnosis, and ultimately personalise treatment, is critical towards better care.
The investigation of molecular dysregulation in MDD has been facilitated by rapid progress in high-throughput technologies. Available brain and peripheral studies, which largely focused on the transcriptome, have established a robust overview of most frequently affected pathways, including immune and stress responses, inflammatory processes, neurotrophic factors and neurotransmitters, among others (for reviews, see7,8). Importantly, these studies also provide accumulating evidence that MDD biomarkers may significantly vary among males and females,9 urging to directly examine, rather than simply control for, sex differences.10,11
Despite these advances, the clinical validity of molecular biomarkers of MDD remains limited. This likely reflects the involvement of multiple etiological factors, which complicates the understanding of underlying pathophysiology. Over the last decade, family and genome-wide association studies (GWAS) have quantified the proportion of the risk for MDD attributable to genetic polymorphisms, with an estimated heritability around 37%.12 Epidemiological studies, on the other hand, have shown that environmental factors, including stressful life events, also significantly contribute. At their interplay, recent advances point towards a critical role for epigenetics, defined as mechanisms that mediate gene by environment interactions.13 Therefore, characterising epigenetic processes has the potential to yield biomarkers that may better capture MDD complexity.
Another important avenue relates to the integration of multiple types of molecular data. Characterising differences between patients with MDD and healthy individuals at the level of single omic layers is unlikely to fully describe molecular interactions leading to the disease, in part due to clinical heterogeneity and technical limitations inherent to each methodology. To overcome this difficulty, research on other medical conditions, such as cancer or chronic respiratory diseases,14,15 has suggested that integration of several molecular modalities represents a promising strategy for case–control classification. In the case of MDD, such efforts have mainly used “step-by-step” strategies, whereby sparse differences are identified individually at the level of each omic layer, using arbitrary thresholds, and then aggregated.16,17 These previous studies, however, did not comprehensively leverage the genome-wide and multiomic nature of available data, nor did they quantify how combining multiple layers may generate more relevant MDD biomarkers.
To address these challenges, the present work was designed to characterise and integrate transcriptomic data with two layers of epigenomic regulation, DNA methylation (DNAm) and microRNAs (miRNAs), generated using peripheral blood samples from patients with MDD (n = 80) and healthy controls (n = 89). Importantly, all analyses were conducted separately in each sex. First, we analysed each omic layer individually to extract genome-wide signatures of MDD that were subsequently validated with external datasets or meta-analyses. Second, we defined the network organisation of gene coexpression. This identified gene modules that were significantly associated with MDD and enriched for epigenomic dysregulation. Finally, a genome-wide and multiomic integration framework was developed, building on the Momix package18 and Similarity Network Fusion (SNF),19 both of which were encapsulated in a Cross-Validation (CV) procedure. This identified candidate biomarkers corresponding to subsets of features that most efficiently clustered patients and controls, and were then tested for stability through bootstrapping. Overall, results indicated that MDD biomarkers were more predictive when they were specifically identified in each sex, while multiomic integration gradually improved performance over single omics.
Methods
Ethics
The study was conducted in accordance with the Declaration of Helsinki and approved by the ‘Comité de Protection des Personnes Sud Méditerranée II’, France (study #2011-A00661-40), with written informed consent obtained from all participants.
Human cohort
Eligible participants were recruited during a naturalistic multi-centric cohort study registered at ClinicalTrials.gov (ID: NCT02209142).20 The study involved 8 departments of psychiatry in 6 different French cities (Marseille, Montpellier, Nîmes, Tours, Besançon and Clermont-Ferrand). Participants were enrolled between 04/05/2012 and 04/03/2015. Cases met DSM-IV-TR criteria for a severe MDD episode at the time of blood sampling (17-item Hamilton Depression Rating Scale, HDRS, score≥19; RRID:SCR_003686), and were treated as usual at inclusion, upon discretion of the treating psychiatrist. Exclusion criteria for both groups were: a history of substance use disorder in the past 12 months; a diagnosis of schizophrenia, psychotic or schizoaffective disorder according to DSM-IV (RRID:SCR_003682); a severe progressive medical disease; pregnancy; vaccination within a month before the inclusion in the study; and being under 18. In the present study, bipolar patients were also excluded. Healthy controls were free of any psychiatric disorder according to semi-structured interviews. The Childhood Trauma Questionnaire (CTQ)21 was administered to both controls and patients. Complete blood counts (total white blood cells, neutrophils, lymphocytes, monocytes, and platelets) were obtained using a Sysmex XN-10/XN-20 Hematology Analyzer (Norderstedt, Germany). Sex was self-reported by participants. Sample size was originally defined in May 2011, before registration, with the initial goal of investigating a single omic layer (mRNA). It was computed with the ssize.fdr R package. We made the following assumptions concerning cases and controls: (i) a mean difference of mRNA gene expression of 1, with a common standard deviation of 1.3; (ii) a false discovery rate (FDR) representing the wrongly assigned over- or under-expression mRNA transcripts set at 5%; (iii) a power of 90%; (iv) a proportion of transcripts that do not exhibit any difference of expression set at 99.5%. Using a bilateral design, a target sample size of 87 participants per group was obtained. While 248 participants were initially recruited (148 adults diagnosed with current MDE and 100 healthy controls), due to the clinical exclusion criteria mentioned above and the exclusion of additional participants due to quantity and/or quality of nucleic acid obtained, a final sample size of n = 169 participants was reached, corresponding to n = 80 patients with MDD and n = 89 healthy controls (Fig. 1, Figs. S1–S2 and Table S1). In this final sample, missing values regarding covariates (body mass index, BMI for 6 patients with MDD, blood counts for 14 patients with MDD) were imputed to the median of each covariate, computed in patients with MDD. Peripheral blood samples were collected from all participants and used as described below.
Fig. 1.
Cohort characteristics and overview of data analysis strategies. a. Cohort description: summary table detailing the cohort statistics according to sex, including the number of controls and patients with MDD and the proportion of participants for which DNAm, mRNA and miRNA data were available. b. Omics data integration: summary of the analytical strategies employed in the study: (1) single-omic differential analyses; (2) gene co-expression analysis followed by step-by-step integration; (3) advanced multiomic integration. Expected outcomes of these analyses included the description of a molecular signature of MDD, the identification of potential MDD biomarkers, and the development of a predictive model for distinguishing patients with MDD from controls. BMI: body mass index; HDRS: Hamilton depression rating scale; RIN: RNA integrity number.
DNA methylation arrays
DNA was extracted from venous blood collected in EDTA-tubes, using PureLinkTM genomic DNA mini kit (Invitrogen, Cat #K182002), and deaminated using the EZ-96 DNA Methylation Kit (Zymo Research, Cat #D5004). Bisulfite conversion efficiency was controlled by qPCR, with 1 assay targeting a methylated region of DNAJC15 and 2 assays targeting the GNAS locus (both unmethylated and methylated alleles). Deaminated DNA derived from blood, amplified in parallel, served as positive control. All samples passed quality control (i.e., Ct-values for either the 2 GNAS loci, or the DNAJC15 locus, reached the amplification threshold no later than 5 cycles compared to the positive control). DNAm levels were then measured using Infinium MethylationEPIC v1 BeadChip microarrays (interrogating around 850 K CpG sites), following Illumina's recommendations (Illumina, San Diego, CA, Cat #WG-317-1003). Probes were annotated to genes using Illumina's probe-gene annotation manifest (https://support.illumina.com/array/array_kits/infinium-methylationepic-beadchip-kit/downloads.html) for both single-omic and gene network analyses. In the latter, a probe was annotated to a given module when it was annotated in the manifest to a gene belonging to that module.
RNA-sequencing
Venous blood was passed through LeukoLOCK™ filters (Life Technologies, Ambion, Cat# AM1933) to eliminate red blood cells, platelets, and plasma. Leukocytes trapped on LeukoLOCK filters were lysed with TRI reagent (Ambion, Cat #AM9738) and mixed with Bromo-3-chloro-propane (Sigma–Aldrich, St. Louis, MO, USA, Cat #B62404). After centrifugation, total RNA from the aqueous phase was precipitated with ethanol, purified on a spin cartridge, washed, eluted with 0.1 mM EDTA, and submitted to DNase treatment (DNA-free™ kit, Life Technologies, Ambion, Cat #AM1906). RNA quantity and quality were assessed using a NanoDrop-1000 (Thermo Fisher Scientific) and 2100 Bioanalyzer (Agilent). Total RNA was used for both miRNA-sequencing and RNA-sequencing. RNA-sequencing libraries were prepared from 500 ng of total RNA, using poly-A capture and the QIAseq Stranded mRNA Select kit (Qiagen, Cat #180451) and TruSeq Stranded mRNA LT Sample Preparation Kit (Illumina, San Diego, CA, Cat #20020595) for females and males, respectively. Libraries were amplified by PCR, quantified by microcapillary electrophoresis, pooled at equimolar concentrations and sequenced on a NovaSeq 6000 (100bp, paired end) for females, or a HiSeq 4000 for males (50bp, single end), generating a mean of 40.09 ± 0.80 million reads/sample.
MiRNA-sequencing
MiRNA libraries were prepared using the Bioo Scientific NEXTflex Small RNA-Seq kit v3 (Bioo Scientific, Cat #NOVA-5132-06), following the manufacturer's instructions. Briefly, 500 ng of total RNA was used as input, and NEXTflex 4N adenylated adapters were ligated to the 3′- and 5′-ends of the RNA. After adapters ligation and clean-up with magnetic beads, first-strand cDNA was synthesised, cleaned, isolated and amplified by PCR (16 cycles). Libraries were profiled using a Fragment analyser (Advanced analytical technologies), quantified using the Qubit dsDNA HS assay (Life Technologies, Cat #Q32851), pooled, denatured and sequenced on an Illumina NextSeq 500, generating a mean of 8.6 ± 0.02 million reads/sample.
Raw data processing
For DNAm data, the R package ChAMP (v2.16.2, RRID:SCR_012891) was used.22 For RNA-sequencing, raw reads were trimmed with bbduk23 and aligned to the GRCh38.p12 human reference genome using STAR v2.5.3a (RRID:SCR_005622)24 and Gencode v29 annotations (RRID:SCR_014966). Gene expression was quantified using HTSeq v0.11.2 (RRID:SCR_005514).25 For miRNA-sequencing data, reads were trimmed using cutadapt v1.18 (RRID:SCR_011841) and aligned following the QuickMIRseq pipeline,26 which uses 2 databases for alignment. First, a small RNA and mRNA database was generated with sequences collected from GRCh38.p12. Then, a miRNA/hairpin database was generated with sequences collected from miRBase (v22-www.mirbase.org, RRID:SCR_003152).27 Reads were aligned using bowtie v1.2.2 (RRID:SCR_005476) at default parameters, and quantified using HTSeq-count (RRID:SCR_011867). The miRTarBase R package (RRID:SCR_017355) was used to identify targets from differentially expressed miRNAs.28
Analyses of covariates
To identify sources of biological or technical variation (Figs. S1–S2), the variancePartition R package (RRID:SCR_019204)29 was applied to miRNA- and RNA-sequencing data, and the ChAMP R package (with the champ.SVD function that uses a singular value decomposition) to DNAm data. Covariates (age, BMI, HDRS, RIN) were compared between controls and patients with MDD using a 2-way ANOVA (sex, MDD status; see Table S1). For miRNA- and RNA-sequencing, white blood cell counts were used in differential expression models (only polynuclear neutrophils and lymphocytes were included, due to collinearity among white blood cell subtypes, as expected; see below and Fig. S2). For EPIC arrays, following the ChAMP pipeline, blood cell composition was inferred using Houseman's method.30 Since smoking status showed significant correlation with MDD (p-value = 2.24 × 10−3, Chi-squared test), it was not included in final models.
Differential expression and methylation analyses
To compare MDD cases and controls, sex-specific differential analyses were conducted independently on each omic. For EPIC arrays, the following probes were discarded: corresponding to non-CG sites; showing a detection p-value> 0.01 in ≥1 samples; bead counts<3 in ≥5% of samples; identified as SNPs in31; or aligning to multiple locations, or to the X or Y chromosomes (n = 724,504 probes remaining). DNAm data were adjusted for covariates using the Combat function from the sva R package (RRID:SCR_012836),32 without applying the preservation function (see Results). Then, differentially methylated probes (DMP, corresponding to individual CpG sites) were identified using ChAMP. For miRNA- and RNA-sequencing data, lowly expressed miRNAs (keeping those with ≥1 reads in ≥60% of either cases or controls, n = 735 remaining) and RNAs (keeping those with >10 reads in average, nfemale = 16287, nmale = 16290 remaining) were first filtered out, followed by covariate adjustment and the identification of differentially expressed miRNAs (DEmiRNAs) and genes (DEGs), using DESeq2 (RRID:SCR_015687, Wald test, WT).33 Each modality was adjusted for specific covariates based on biological factors outlined in the literature and our analysis of potential sources of variance (see above and Figs. S1–S2): DNA methylation (DNAm) data were adjusted for age, slide, array, BMI, and blood cell composition (using Houseman's method); mRNA data were adjusted for age, BMI, RNA integrity number (RIN), lymphocyte percentage, and polynuclear neutrophil count; miRNA data were adjusted for library preparation batch effects, age, BMI, RIN, lymphocyte percentage, and polynuclear neutrophil counts. For pooled analyses of male and female data, sex was included as an additional covariate to account for potential sex-based differences.
Gene ontology
Functional enrichments were computed using: (i) for DNAm data, the missMethyl R package34 with the top 10,000 DMPs; and for mRNA data, the Webgestalt implementation of Gene Set Enrichment Analysis (GSEA, RRID:SCR_003199, weighted Kolmogorov–Smirnov statistic, KS),35 with 10,000 permutations on GO, KEGG, Wikipathway or Reactome databases; or the fgsea R package (RRID:SCR_020938)[preprint]36 when using lists of DEG, CpGs or DEmiRNA from previous studies as gene sets.
Rank–rank hypergeometric overlap (RRHO) analysis
To compare female and male data, we used Rank–Rank Hypergeometric Overlap (RRHO2, RRID:SCR_022754),37 as described previously,38,39 using the R package available at: https://github.com/Caleb-Huo/RRHO2. For each omic, results from differential analysis in each sex were ranked based on the following metric: −log10(p-value) x sign(log2 Fold Change). Then, the RRHO2 function was applied to the 2 lists at default parameters (with step size equal to the square root of the list length). Of note, to enable the processing of high numbers of DNAm probes, we implemented a parallelised version of RRHO2 using the mcmapply function (see Code Availability). Significance of hypergeometric overlaps between female and male changes are reported as −log10(p-values) with p-values corrected using the Benjamini–Yekutieli procedure.
Stepwise multiomic gene network integration
Coexpression network construction
For Weighted Gene Co-expression Network Analysis (WGCNA, RRID:SCR_003302), RNA-sequencing gene counts were adjusted using linear models (for the same covariates as above), after variance stabilisation normalisation.33 Then, WGCNA was performed on adjusted counts in male and female separately.40 The optimal sets of WGCNA parameters were selected to maximise odds-ratio of the overlaps between gene modules and pathways of the Reactome database41 (power = 13, MinModuleSize = 20 for males, power = 9, MinModuleSize = 15 for females, and DeepSplit = 4, CutHeight = 0.1 for both). Gene modules were tested for preservation in other blood transcriptome datasets using z-summaries of the WGCNA's modulePreservation function.
Gene module annotations
Gene modules were tested for enrichment in: i) differences in gene expression identified in each sex, using GSEA; ii) targets of DEmiRNAs, using an hypergeometric test and the fgsea::fora function, iii) DNA methylation dysregulation, using fgsea and the following approach: each module was converted into a set of CpGs according to Illumina's probe-gene annotation manifest, and tested for enrichment in the distribution of all CpG probes, ranked according to the direction and significance of their dysregulation in MDD; iv) single nucleotide polymorphisms (SNP, assigned to their nearest gene within 1 Mb) associated with MDD, bipolar disorder, self-reported childhood maltreatment (CTQ scores) or BMI, using MAGMA (RRID:SCR_005757)42 and summary statistics from previous GWAS.43, 44, 45, 46, 47 Associations between each module's eigengene and MDD status, HDRS or CTQ scores were evaluated using Spearman correlations. To establish a hierarchical order of relevance among modules associated with MDD, we implemented a Module Prioritisation Score corresponding to a ranking based on the weighted average of normalised enrichment scores (see below).
Module prioritisation score
For each module and each test (enrichment, correlation or association test), sub-scores were computed as follows. First, we calculated sub-scores based on enrichment tests for MDD associated mRNA dysregulation (referred to as ) and DNAm changes (referred to as ). To compute these scores for each module, the GSEA outputs were used as follows:
For DEmiRNA targets enrichments, the outputs of the Over-Representation Analysis (ORA) tested for each DEmiR and each module were derived as follows:
Then, the correlation coefficient between module eigengenes and clinical variables such as CTQ scores, HDRS (RRID:SCR_003686) and MDD status, was used to derive scores () giving priority to modules that have significant correlation and higher absolute correlation coefficient values to these variables. These scores were computed as follows:
Finally, as described in the section “Stepwise multiomic gene network integration”, MAGMA was used to test for enrichment in SNPs associated with MDD, bipolar disorder, CTQ scores or BMI. From these results (available in Table S10), multiple scores (, , , and ) were derived as follows:
These sub-scores were then aggregated for each module to reflect their degree of association with MDD. Priority was given to enrichments for differential analyses results, with a penalty applied to modules associated with the BMI GWAS (regarded as a confounding factor). The final Module Prioritisation Score ( was calculated as follows:
where the star indicates that sub-scores were normalised prior to their aggregation, to ensure unbiased comparison. The min–max normalisation was used to contain all subscores in the range from 0 to 1.
SNF Label Propagation
Let us introduce and , the number of samples in the train and test set, respectively, and , the disjunctive table associated with the assigned labels in the train set; in other words, if sample is in group , and otherwise. Label propagation provides a way to estimate
that characterises the probability for each sample in the test to belong to each group. This estimation is performed through the following algorithm:
Algorithm: Label Propagation principle | |||
---|---|---|---|
1 | Data:, , . | ||
2 | Results:. | ||
3 | Init: such that ; | ||
4 | ; | ||
5 | ; | ||
6 | whiledo | ||
7 |
|
||
8 | ; | ||
9 | ; | ||
10 | end | ||
11 |
where denotes the pairwise similarity matrix between all samples (both from the train and test sets) computed by SNF, is the maximum number of iterations and is a matrix of dimension only made of zeros. From equation (1), , the estimation at step of the probability of the test sample to belong to group , can be written as follows:
(2) |
It can be interpreted as a weighted mean of the probability of each sample to belong to the group (if a sample comes from the train set, this probability is either 0 or 1), where the weights depend on the similarity between sample i and all the others. In equation (2), the only elements that depend on step are and at , they are all equal to zero. Thus, at the first iteration, what matters is the distance of the considered test sample to the train samples. Then, progressively, as the probability of each test sample increases in a specific group and decreases in all the others, this will modify the test samples’ neighborhood, hence the notion of propagation.
Multiomic integration
Pre-processing and sampling for CV
Pre-processing was applied separately for each train set of the CV procedure. For mRNA and miRNA, counts were filtered out as described above. Then a variance stabilisation normalisation was applied,33 followed by covariate corrections, as described above. For DNAm data, Combat was used to adjust for covariates, with 2 modifications: (i) we used neuroComBat v1.0.5, an improved version dedicated to CV procedures: https://github.com/Jfortin1/ComBatHarmonization/tree/master/R48,49; (ii) for the pooled and female cohorts, the Slide covariate was corrected separately in the train and test sets, as it was composed of too many categories for proper representation (in train/test sets) and correction across all CV folds; for the male cohort, for similar reasons, both Slide and Array were corrected separately. Furthermore, to avoid situations with a Slide category represented by a single observation, which would impact correction models, all observations of each Slide category were associated with either the train or the test set. Due to computational limitations, a pre-filtering was applied to DNAm data, with 10% of most variable CpGs selected for downstream analyses.
Pre-processing and sampling for bootstraps
The preprocessing described in the previous paragraph was applied to each bootstrap sample, with 3 modifications: (i) filtering for all 3 omics was undertaken prior to any resampling, to ensure that the input set of variables was identical across all bootstrap samples (important to derive confidence intervals for each feature); (ii) all observations belonging to the same Slide category were not sampled together anymore, as this would have impacted the randomness of the resampling (with replacement); (iii) as a consequence, there were cases when a Slide category was composed of a single observation, which prevented computing and correcting for an intra-category variance. The Python library neuroComBat by default corrects for a batch effect at the mean level and, if possible, at the variance level. This default option was used for the CV procedure. For the bootstrap procedure, as situations where only the mean could be corrected were frequent, we opted: (i) for the DNAm and Slide covariates, to always only correct at the mean level (never at variance level), to harmonize the analysis across samples and hopefully lead to more robust variable selection; (ii) for other covariates (less frequently concerned), the bootstrap sample was discarded and another one drawn (which was not feasible for Slide due to a high number of categories with few observations).
Post-processing for Bootstrap
Both JIVE and RGCCA's implementations heavily rely on SVD, which can induce rotational indeterminacies. The current implementation of RGCCA being sequential (i.e., each component is estimated one after another), this indeterminacy reduces to a sign indeterminacy which is handled in the current version of the R package. JIVE estimates all components simultaneously. As a result, weight matrices are similar across bootstrap samples up to a rotation matrix. Following,50 for each bootstrap sample, a Procrustes problem was solved to learn a rotation matrix allowing to realign the current weight matrix (composed of all 10 estimated factors) to a reference one. The reference corresponds to the weight matrix estimated out of the whole cohort (either female/male/pooled), with the same combination of omics, without any resampling.
Statistics
Feature selection
Only participants with complete data across all 3 omic blocks were included in this integration. Building on the Momix benchmark,18 we encapsulated 6 joint Dimension Reduction (jDR) methods in a CV procedure: RGCCA, JIVE, MCIA, MOFA, intNMF and SciKit-Fusion in order to extract common variance between the HDRS matrix and every possible combination of either 1 (mRNA, miRNA, DNAm), 2 (mRNA/miRNA, mRNA/DNAm, miRNA/DNAm) or 3 (mRNA/miRNA/DNAm) omics (7 possible combinations). For each combination of jDR method and omics, a factor matrix , representing the shared variance across the data tables considered was derived (where is the number of samples and the number of factors extracted). Among these 10 factors, , the one that correlated most (in absolute value) with MDD status, was kept. Then, weight vectors (where refers to 1 omic table among the considered, the HDRS matrix aside), used to compute , were retrieved. These weights represent the contribution of each feature to the factor of interest. For each modality, only the top 10%-ranking elements of (in absolute value) were retained.
Clustering with SNF
To evaluate the ability of selected features to estimate meaningful clusters of participants with regard to MDD status, we used SNF.19 While SNF is an unsupervised integrative clustering technique, in the present work it was applied following supervised feature selection, and its parameters were tuned in a supervised manner. To do so, several sets of SNF parameters were evaluated through CV (every possible combination between neighbours , iters and ). Systematically, K = 2 clusters were estimated through SNF and compared to the MDD status with the metric: Area Under the ROC Curve (AUC; SNF provides probabilities of belonging to a cluster). Then, cluster labels were transferred from the train to the test set using label propagation (implementation provided by SNF). This consists in an iterative propagation, to the test set, of labels defined during training, based on a measure of similarity between samples. The set of parameters leading to the highest AUC averaged across all test sets was kept.
Cross-validation
A repeated 5-fold CV (5 repetitions, Fig. S3) with stratification was used to prevent overfitting. For each re-sampling to be representative of the original dataset, stratification was systematically undertaken based on MDD status and sex (pooled cohort only) covariates. The Slide covariate (specific to DNAm) was also accounted for in the re-sampling process. For each train set (4/5 of samples), pre-processing, feature selection and clustering were performed as described above. For the test set (remaining 1/5), variables selected on the train set were extracted, pre-processed by applying the different models fitted on the train set and label propagation was used to transfer inferred clusters from the train to the test. For comparison, the same CV procedure was applied to features corresponding to differential analyses results (estimated on each train set separately) and to all features without selection.
Bootstrapping
A bootstrap resampling strategy51 was employed to evaluate feature stability for the 2 jDR methods that proved most accurate (JIVE, RGCCA). B = 1000 bootstraps of the same size as the original cohort were repeatedly sampled, with replacement. For the pooled cohort, each re-sampling was stratified according to sex, to ensure that the sex ratio was representative of the original cohort. For each re-sampling, pre-processing and jDR-based feature selection were performed as described above. Two metrics were computed for each variable, each omic combination and each jDR method through the 1000 samplings: first, the occurrence of selection, corresponding to the number of times a feature was associated with the highest discriminative factor with regard to MDD, and in the top 10%-ranking weights among its omic modality (in absolute value); second, the sign ratio of the weights obtained across all bootstraps (i.e., the ratio of the number of positive and negative weights, with the lower number as numerator), where stand respectively for the feature, the sample, the omic and the best factor index (according to correlation with MDD status). This ratio can be seen as a non-parametric way of estimating the probability for a weight to change signs across bootstraps. It ranged between 1/B (as B samplings were undertaken) and 1 (as many negative as positive estimates). The lower this probability, the less likely a weight is to change sign across bootstraps, and therefore the more confidently it can be considered different from zero. These probabilities were adjusted using the Benjamini-Hochberg procedure.52 Stable features were defined by an adjusted probability <0.05 and an occurrence of selection >80%.
Role of funders
The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Results
Single-omic analyses identify mRNA, miRNA and DNAm changes in MDD
We first identified differentially methylated probes (DMPs), differentially expressed miRNAs (DEmiRNAs) and differentially expressed genes (DEGs) between patients with MDD and controls (see Tables S2–S4 for full genome-wide results). For DNAm, among the >700 k probes analysed (Fig. 2a), a limited subset met an FDR threshold <0.1 in females (81 hypomethylated and 74 hypermethylated), while none were identified in males. This reflects our analytical strategy, in which correction for covariates was conducted without the preservation function implemented by default in the popular Combat/ChAMP package (see Methods). We indeed observed that, while this function strongly preserved the ranking of probes in differential methylation results, it strongly inflated their p-values, likely increasing false positives (see Fig. S4 for results with preservation). Similar effects have already been reported for simulated data, or in studies investigating methylomic changes associated with obesity or genetic variants.53 As such, all downstream results presented in this study are based on analyses without preservation. Importantly, despite the moderate number of genome-wide significant DMPs in our conservative approach, concordance with results from previous studies was still detectable using the GSEA threshold-free algorithm (Table S5). Accordingly, significant enrichments were observed for DMPs identified by Tao et al. when comparing drug-naive adolescents presenting a first MDD episode with healthy controls.54 Used as gene sets, these probes were significantly enriched among those with methylation changes in both our male (normalised enrichment score, NES = 1.04, padj = 5.74 × 10−2, KS) and female (NES = 1.05, padj = 2.70 × 10−3, KS) cohorts.
Fig. 2.
Differential analysis results for single-omic comparisons of patients with MDD and controls. Representation of results obtained for sex-specific analyses in male and female cohorts, with double volcano plots for differentially methylated CG sites (DNAm, panel a), differentially expressed miRNAs (b) or mRNAs (c), as well as counts of features with an adjusted p-value ≤0.1 (d; see main text for details).
For miRNAs, miR-124-3p was the only genome-wide significant hit in males (log2FC = 1.88, padj = 9.9 × 10−3, WT), and was similarly upregulated in females (log2FC = 2.08, padj = 0.054, WT; Fig. 2b). Interestingly, these findings are consistent with the considerable attention that this specific miRNA has received over the last few years as a candidate MDD biomarker.55 In females, 44 additional miRNAs were significantly dysregulated (23 up, 21 down). Comparison with results from previous studies,56 again using GSEA, showed significant enrichment for upregulated miRNAs in both our male and female cohorts (females: NES = 1.55, padj = 4.3 × 10−3; males: NES = 1.63, padj = 4.5 × 10−3, KS). These findings indicated that, similar to DNAm, our data captured part of a miRNA signal previously associated with MDD.
Regarding mRNAs, large numbers of features met genome-wide significance, with 2749 (1527 up, 1222 down) and 1775 (1159 up, 616 down) DEGs in females and males, respectively (FDR < 0.1; Fig. 2c and d). For validation, these results were compared with those from a recent large meta-analysis of MDD peripheral blood studies.57 Using GSEA, up- or downregulated DEGs from the latter study were significantly enriched in similar directions in our male cohort (up: NES = 1.53, padj = 1.6 × 10−3; down: NES = −1.60, padj = 1.6 × 10−3, KS), while no significant enrichment was found in females. Overall, these results provide external validation for each individual omic layer analysed in the present work.
Comparisons across females and males reveal shared and sex-specific MDD signatures
We next compared molecular differences associated with MDD in females and males. The concordance between sexes was low for the 3 omic modalities (Fig. 3a–c). Among those passing a relaxed nominal significance threshold (p < 0.05), only 2.6% of DNAm probes (n = 2674/102041), 7.0% of mRNA (n = 483/6870) and 6.1% of miRNA (n = 9/148) showed changes in a similar direction in both sexes (Tables S2–S4). At such thresholds, specific DNAm probes and miRNA showed differences in 1 sex only (with overlaps among MDD- and sex-associated changes observed for both molecular layers; Fig. S5), suggesting that future studies should more systematically explore their potential interactions. Overall, results are consistent with the increasing recognition of large sex-differences in MDD.58,59
Fig. 3.
Comparisons across females and males of molecular changes associated with MDD. a. Overlap of DMPs (nominal p-value ≤0.05, WT). b. Overlap of DEmiRNAs (nominal p-value ≤0.05, WT). c. Overlap of DEGs (nominal p-value ≤0.05, WT). d. Top functional enrichments of differentially methylated DNAm probes (DMP, passing nominal p-value ≤0.05), identified using missMethyl. Enrichments were computed by hypergeometric testing against Gene Ontology terms (see Methods), or against the present's study list of DEG identified in male or female patients with MDD (last line, n = 2111). The figure depicts the number of genes in each enrichment, while arrows indicate whether enrichments originated from up- or down-regulated DMP, or both (no arrow). e. Top functional enrichments of mRNA dysregulations (adjusted p-value ≤0.05, hypergeometric test) associated with MDD, identified using GSEA. Numbers at the end of each bar indicate the normalised enrichment score (NES), whose sign indicates whether enrichments originated from up- or down-regulated mRNAs. f–h. Heatmaps representing two-sided rank–rank hypergeometric overlap analyses (using the RRHO2 algorithm), and displaying threshold-free overlaps among molecular changes associated with MDD in male and female cohorts.
Next, we conducted functional enrichment analyses in males and females, starting with DNAm probes using missMethyl and the Gene Ontology (GO) and KEGG databases. Results showed similar trends and concordant directionality in both sexes, involving terms related to neuronal function such as dendrite development and regulation of axonogenesis (see Fig. 3d, Tables S6 and S7 for the distribution of probes along gene features). Interestingly, hypergeometric testing of these probes against the present study's list of DEG showed significant overlaps, indicating convergence among the 2 omic layers. For mRNA, in comparison, gene sets enriched for MDD dysregulation (using GSEA and the KEGG database, and KS tests) were relatively more divergent (Fig. 3e and Table S8): in females, they involved hormone synthesis, such as thyroid/parathyroid, ovarian steroids, aldosterone (NES = 1.84, FDR = 0.03) and cortisol (NES = 1.84, FDR = 0.02), and the glutamatergic synapse (NES = 1.86, FDR = 0.04). In males, they were notably related to: immune responses, such as phagocytosis (NES = 2.14, FDR < 10−4), Toll-like receptor signalling (NES = 1.67, FDR = 1.8 × 10−2), leukocyte transendothelial migration, necroptosis (NES = 1.72, FDR = 1.19 × 10−2), IL-17 signalling pathway (NES = 1.76, FDR = 7.22 10−3); energy metabolism, including oxidative phosphorylation (NES = 1.81, FDR = 3.81 × 10−3) glycolysis (NES = 1.71, FDR = 1.37 × 10−2), pentose phosphate pathway (NES = 1.84, FDR = 2.96 × 10−3), carbon metabolism (NES = 1.82, FDR = 3.99 × 10−3); the proteasome (NES = 2.21, FDR < 10−4); mTOR signalling (NES = 1.56, FDR = 4.59 × 10−2); and the synaptic vesicle cycle (NES = 1.88, FDR = 1.61 × 10−3). These results are consistent with previous peripheral blood studies of MDD, which identified similar GO terms related to the stress axis, immunity and brain neuronal physiology.17,25 They also document significant sex differences in MDD, which we further characterised using RRHO2.
RRHO2 performs iterative hypergeometric testing for all combinations of ranking thresholds applied to each female or male dataset, generating “threshold-free” genome-wide comparisons (Fig. 3f–h, Table S9). Interestingly, this approach uncovered global patterns of similarity, wherein large groups of mRNA, miRNA and methylation probes exhibited dysregulation in similar directions among males and females with MDD. Most significant overlaps involved upregulated features (corresponding to RRHO2 bottom-left quadrants; DNAm: −log10(p-val) = 410.3; miRNA: −log10(p-val) = 8.5; mRNA: −log10(p-val) = 29.1), with milder but still strongly significant overlaps also detected for downregulated ones (upper-right quadrants; DNAm: −log10(p-val) = 193.01; miRNA: −log10(p-val) = 1.58; mRNA: −log10(p-val) = 3.95). As such, while most significantly affected features poorly overlapped (Fig. 3a–c), a stronger sex concordance became detectable when considering larger groups of genes and probes that individually exhibited milder MDD-related changes.
Intriguingly, in addition to these adaptations occurring in similar directions, the RRHO2 analysis also identified, for the mRNA layer, groups of genes showing opposite changes across males and females (corresponding to the 2 upper-left and bottom-right quadrants). While a global pattern of MDD-related molecular concordance across males and females is intuitive, as shown here in blood tissue and previously described in the brain,29 this restricted additional discordance is surprising. Another post-mortem brain study of MDD, however, brings support to this notion.30 In a meta-analysis, 52 genes were identified as similarly sex-discordant across 3 regions (dorsolateral prefrontal cortex, anterior cingulate cortex, basolateral amygdala). Of note, no significant overlap was detectable between this gene list in the brain, and our own discordant genes in blood (Fig. S6), possibly reflecting the low number of genes involved, or differences across tissues. Overall, these results suggest a model whereby MDD may associate with gene expression changes that predominantly exhibit similar directionality in both sexes, while a discordant pattern may simultaneously affect smaller and potentially tissue-specific gene sets. Further work will be necessary to substantiate this hypothesis.
Stepwise network-based annotations identify male and female gene modules associated with MDD
As a first strategy to integrate the 3 omic layers, we next used a step-by-step approach. WGCNA was applied to RNA-Seq data to construct gene coexpression networks, followed by annotation of gene modules using multiomic enrichments. This strategy builds on recent work on other psychiatric disorders.60 To provide external validation of modules that were generated, their preservation was first assessed in blood transcriptomic data from participants to the GTEX consortium (whole blood),61 or from Krebs et al. (peripheral blood mononuclear cells).62 As expected, most modules showed good preservation (z-summary>2; Fig. S7a): this was the case for 96.3 and 88.8% of our male and female modules compared to GTEX, and 99.0 and 91.8% of our modules compared to Krebs et al., respectively. In total, 135 and 111 modules were identified in females and males, respectively (Table S10). Interestingly, regardless of their preservation or association with MDD, these modules showed very different gene composition across sexes (as assessed using pairwise Jaccard indices, JI; Fig. S7b–d). Hence, there are significant sex differences in the organisation of gene co-expression networks in blood tissue, as already illustrated by other groups.58,63
We then sought to identify modules that are most relevant for MDD, by primarily quantifying their enrichment for MDD-related molecular differences observed in the present or in previous studies (Table S11). To do so, we computed a prioritisation score (see Methods) based on the following criteria: i) enrichment in mRNA or DNAm changes (GSEA, threshold-free), or in targets of DEmiRNAs; ii) association with MDD diagnosis, MDD severity (HDRS score), or a past history of childhood trauma, a risk factor for MDD (CTQ score) and iii) enrichment in genetic variation associated with mood disorders or childhood trauma in previous GWAS (using MAGMA, see Methods and Table S11). The twenty modules (10 in each sex) with highest priority scores are presented in Fig. 4a. Across all modules, we found significant correlations among associations of module eigengenes with MDD status, HDRS score, mRNA dysregulation in MDD, and targets of miRNA-124-3p (Fig. S8), consistent with the notion that WGCNA co-expression and differential expression analyses capture partly overlapping phenomena.
Fig. 4.
Sex-specific WGCNA modules prioritised for their association with MDD. a. Circos plot of the top 10 MDD related modules in females and males: each triangular section represents a module. Male modules are highlighted in blue, female ones in red. Concentric circles represent results from step-by-step enrichment or correlation tests conducted for prioritisation. From the outside to the inside, Circle 1 (C1): enrichment test for MDD-related mRNA dysregulation (normalised enrichment score, NES, GSEA); C2: DEmiRNAs targets enrichment test (log odds-ratio); C3: enrichment tests for MDD-related DNAm methylation dysregulation (NES, GSEA); C4-5: correlation test between module eigengenes and MDD status (C4), or CTQ score (Childhood Trauma Questionnaire, C5); C6-9: enrichment tests for SNPs associated with MDD (C6), childhood trauma (C7), bipolar disorder type I (C8) or II (C9) in genome-wide association studies (computed using the MAGMA approach, see main text; p-value). A colour gradient was applied for each module's enrichment that met statistical significance (p-value < 0.05). b. Graphical network representation of the 2 modules most strongly associated with MDD (M:ME48 and F:ME129). These 2 modules, as well as others, were enriched for genes identified in RRHO2 as showing opposite MDD-related mRNA changes in females as opposed to males (Fig. 3h). The circle-shaped vertices represent genes, the triangle ones miRNAs. Grey-coloured edges correspond to weighted co-expression between genes, those in red to pairs of miRNAs and their know targets in the mirBase database. The size of the vertices is proportional to their degree of connectivity.
We then focused on the 2 top modules for in-depth analysis (one in males, M:ME48, one in females, F:ME129), and their annotations. Specifically, the male module M:ME48 was composed of 181 genes, and was significantly enriched in genes upregulated in males with MDD (NES = 2.10, padj = 1.1 × 10−3, KS). Conversely, the female module F:ME129, composed of 66 genes, was enriched for genes downregulated in females with MDD (NES = −2.12, padj = 3.4 × 10−3, KS). These 2 modules also shared enrichments for targets of the 6 same miRNAs (miRNA-124-3p, miRNA-532-3p, miRNA-92a-3p, miRNA-1270, miRNA-181d-5p and miRNA-4286), with only 2 additional miRNAs specifically associated with each module (miRNA-550a-3p and miRNA-320b for M:ME48; miRNA-4516 and miRNA-5585-3p for F:ME129). The network organisation of these 2 modules is presented in Fig. 4b, with DEGs and validated targets of DEmiRNAs highlighted as hubs within each module. Interestingly, the differentially expressed miRNA-124-3p (males: padj = 9.9 × 10−3, females: padj = 5.4 × 10−2, WT) appeared centrally located in both modules, with significant enrichment for its target genes (M:ME48: padj = 4.5 × 10−27; F:ME129: padj = 2.31 × 10−13, KS), suggesting that this miRNA may potentially act as a regulator among these modules (and potentially others, see Fig. S9). Gene ontology, KEGG and Reactome pathways (Table S10) identified enrichments of M:ME48 in immune response, including neutrophil-mediated immunity and neutrophil degranulation, while F:ME129 was enriched in terms primarily involved in cellular response to stress. Overall, this step-by-step network approach provides a sex-specific description of modular gene coexpression changes in MDD, and identifies modules that are most significantly affected at multiomic levels.
Advanced integration provides more accurate and sex-specific multiomic biomarkers of MDD
Finally, with the goal of fully leveraging our genome-wide and multiomic data, we implemented a supervised integration framework designed to identify features that discriminate patients with MDD from healthy controls, building on Momix18 and SNF19 (Fig. 5, and Methods). The SNF approach computes patient similarity matrices for each omic modality, before merging them into a single network, using a nonlinear method based on message passing theory. While the method can handle all available features, a subset may be particularly relevant to MDD. As such, here we first opted to apply multiomic dimensionality reduction methods from Momix18 to identify the top 10% features that co-varied most according to MDD severity, and benchmarked 6 jDR methods: RGCCA, JIVE, MCIA, MOFA, intNMF and SciKit-Fusion. Then, these features were extracted to generate a fused similarity matrix (across omics) and to cluster individuals into two groups in the train dataset, using SNF, with parameters optimised to predict the MDD/control status (i.e., to maximise the AUC, the performance metric; see Methods). Finally, the SNF label propagation procedure was applied to the test dataset to assign each new individual to a cluster. For comparison, the AUC was also computed using sets of features corresponding to results from single-omic differential analyses (Diff), or without any selection (No selection). Importantly, to quantify how consideration of sex or an increased number of omic layers may improve the MDD/control classification, the whole procedure was independently applied to the female cohort, the male cohort, or their combination, as well as to every combination of 1, 2 or 3 omic types. Finally, to control for overfitting and biases due to the cohort size or covariate corrections, a start-to-end repeated 5-fold CV was conducted, while feature stability was evaluated through bootstrapping.
Fig. 5.
Summary of the multiomic integration and classification framework. For each of the 25 splits of a 5-fold cross validation (with 5 repetitions), every combination of 1, 2 or 3 types of omic data, as well as clinical severity of depression (scores from the 17-item Hamilton rating scale for depression) were given as inputs to 6 different joint dimension reduction methods (jDR). For each train dataset, the resulting factor matrices were then correlated with the MDD/control status, and the top 10% of omic features contributing to the best factor (i.e., showing highest correlation with MDD) were extracted to construct new matrices with only those selected features. A multiomic similarity matrix was then generated for each train set, using SNF, in order to infer 2 clusters of individuals. Finally, covariate correction and feature selection from each train set were applied to each corresponding test set, and the SNF label propagation procedure applied to predict the group of new individuals in each test set.
Results are presented in Fig. 6. Best AUCs were obtained with features identified by JIVE and RGCCA, the 2 jDR methods that, importantly, classified patients and controls significantly better than Diff and No selection (Fig. 6a). In comparison, features prioritised by intNMF, MCIA and Scikit-Fusion did not provide such improvements, while MOFA outperformed only No selection, but not Diff (see Discussion for more details). For downstream analyses, we therefore focused on JIVE and RGCCA and, using bootstrapping, assessed the stability of the features they identified. Stable mRNA and DNAm features (see Table S12 for full lists) were found to significantly overlap across different omic combinations, as well as to overlap within, but not across, sex-specific cohorts (Fig. S10). Stable features also converged with results obtained previously using single-omic or step-by-step network strategies. Accordingly, the 2 top modules M:ME48 and F:ME129, as well as 8/10 and 4/10 of the female and male MDD-related modules, were significantly enriched for stable mRNA features (Fig. S11). Interestingly, additional gene modules previously not prioritised during the step-by-step network approach also showed significant enrichment for stable features, suggesting an informational gain provided by this third multiomic strategy.
Fig. 6.
Comparisons of MDD/control classification performance across different combinations of omic data, and stratification by sex. a. Boxplot of AUCs (Area Under the Curve) obtained for each of 7 feature selection methods: differential analyses, Diff; jive; rgcca; intNMF; mcia; mofa; scikit; or no selection. Mean AUCs for each method were first compared to those obtained using differential analyses (p-values in purple) or without any selection (p-values in red, t-tests). AUCs were computed on the test data for each of the 25 splits of the cross-validation (5-fold, with 5 repetitions) for every combination of 1, 2 or 3 types of omic data, for males, females, or pooled cohorts. b. Detailed representation of the performance achieved by the 2 best joint dimension reduction methods (JIVE and RGCCA) according to sex stratification and the number of omic data considered (p-values correspond to pair-wise comparisons).
Finally, we investigated the impact of sex and multiomic aggregation on MDD prediction. First, results showed that stratification by sex significantly improved performance in both female (p = 2.7 × 10−16, t-test) and male (p = 0.045, t-test) cohorts, compared to the pooled one (Fig. 6b), as shown by the greater AUC. Consistent with the sex-differences described above at single-omic level or during network integration, these results reinforce the importance of accounting for sex towards developing MDD biomarkers. Second, comparisons across omic combinations showed that AUCs progressively increased with the number of omic layers used: performance tended to improve from 1 to 2 (p = 0.13, t-test), or from 2 to 3 omics (p = 0.079, t-test), and became significantly better from 1 to 3 omics (p = 4.8 × 10−3, t-test). Overall, these results indicate that multiomic panels of biomarkers have the potential to improve the prediction of MDD.
Discussion
By combining transcriptomic data and two layers of epigenomic regulation derived from peripheral blood samples, the present work delineated MDD associated molecular signatures. Our findings demonstrated that the resulting biomarkers were more predictive when identified specifically in each sex, while multiomic integration performed better than single-omic analyses.
Comparisons with previous studies first provided external validation for each omic layer. At DNA methylation level, changes in patients with MDD showed enrichments for neuronal physiology, including dendrite morphogenesis and neurotrophin signalling. This is in line with previous studies of MDD and other neuropsychiatric disorders, which suggest that, in addition to pathophysiological mechanisms occurring in the brain, parallel molecular adaptations may also be detectable in blood.64, 65, 66 Regarding miRNAs, results from multiple analyses all converged on the identification of the brain-enriched miRNA 124-3p, which was upregulated in patients with MDD of both sexes, was centrally located within the two modules prioritised in females and males by network integration, and also belonged to stable miRNA features during multiomic integration (in males, when considering miRNA and DNAm with RGCCA). This is consistent with the fact that this miRNA has been extensively investigated in recent years in rodent models of depression, and found significantly upregulated in the blood of patients with MDD.67,68
Our study was conducted with a systematic and careful consideration of sex. Females and males with MDD show substantial differences in clinical presentation, course, rates or types of psychotropic drug prescription,69 and response to antidepressant treatment.70,71 As such, pathophysiological molecular processes, as well as associated peripheral biomarkers, are expected to differ as a function of sex. Importantly, although most MDD studies include this factor as covariate, only a few directly quantified its impact, and did so in brain tissue only.58,59,72 Here we extended such consideration to peripheral blood, across 3 omic modalities. Results from various analytical strategies systematically pointed toward important differences. At single-omic level, features most strongly impacted by MDD significantly differed among females and males, with a degree of convergence that only emerged when loosening significance thresholds (RRHO2). This was further reflected in functional enrichment, particularly for the mRNA layer: in males, enriched terms included immune system response, IL-7 signalling pathway and Toll-like receptors; in females, glutamatergic signalling, production and secretion of thyroid and parathyroid hormone, aldosterone and ovarian steroidogenesis (KEGG pathways). Interestingly, sex variations in some of these responses have already been documented in MDD: in males, increased levels of immune response proteins were observed (including C-reactive protein),73 while inflammatory indicators were better predictors of the condition.74 Regarding gonadal steroids, although their interactions with stress regulation have been well-characterized in rodents, only a few human studies suggest that they may mediate the stronger link observed in females between life stress and MDD.75, 76, 77 Our results provide molecular data that appear consistent with these notions, and will need to be confirmed in larger cohorts. Strong sex differences further emerged during our second (network) and third (multiomic) integrative strategies. The gene composition of modules identified using WGCNA showed low concordance across sexes, regardless of their association with MDD. This was expected, as similar differences have been previously described by the GTEx consortium in blood and other tissues.63,78 More surprisingly, among the 2 modules most significantly associated with MDD, some of the observed dysregulations occurred in opposite directions in males and females (Fig. 4), and features prioritised for case/control classification exhibited clear sex-specificity. Taken together, these findings extend on previous evidence by suggesting that, across 3 omic layers, MDD may manifest with distinct, and possibly partly opposite, molecular biomarkers among females and males. Importantly, our multiomic prediction of MDD status significantly increased when features were identified in each sex independently, despite the associated reduction in sample size and statistical power. This indicates that, in the future, considering sex may help improve the identification of reliable MDD biomarkers, and that a better understanding of these differences may help prioritize sex-specific and potentially more efficient therapeutic options.
The second main aspect of the present work was the integrative analysis of multiple omic layers. To do so, we implemented a supervised selection of molecular features (jDR), followed by SNF for prediction of clusters of individuals. Previously, Bhak and colleagues16 used supervised machine learning (random forest) to differentiate 56 suicide attempters, 39 patients with MDD, and 87 controls, based on gene expression and methylation data, and reported accuracies around 0.9, depending on the clinical groups considered. However, the features used for optimisation of classifier models corresponded solely to differentially expressed and methylated loci. This first step of feature selection was not embedded in the training procedure, suggesting that the leave-one-out CV procedure used to test classifiers may have overestimated their performance. In our framework, to try and control for overfitting, covariate correction and feature selection were both conducted independently within each of 25 CV iteration-splits (see Fig. S2), followed by bootstrap analysis to test for feature stability. Importantly, because this methodology was applied to a cohort assessed in parallel for 3 omic layers, we were able to systematically quantify the informational gain attainable for predicting MDD status when considering 1, 2, or 3 layers. Among the 6 jDR approaches used for feature selection, JIVE, RGCCA and MOFA generated the best results. Of note, JIVE and RGCCA outperformed results obtained when using features corresponding to single-omic differential analyses (DEGs, DEmiRNAs or DMPs). Rather than case–control comparisons, these results therefore argue for the added value, when trying to predict MDD status, of integrative methods that consider covariance between multiomic measures and clinical severity. Further, they document the quantitative relationship between multiomic aggregation and classification power in MDD. This advocates for the identification and design of multiomic panels of biomarkers. Beyond diagnosis, such panels are expected to contribute to stratification based on severity, identification of patient subgroups, or prediction of treatment response, with the long-term goal of reaching clinical utility and personalised medicine. To do so, fully leveraging the unsupervised nature of SNF to identify subgroups of patients, introducing sparsity during feature selection (with e.g. SGCCA,79 netSGCCA,80 or PathME),81 applying techniques such as kernel-based approaches or deep learning (to address nonlinear interactions, as in cancer research),82 as well as integrating time-series data, environmental and lifestyle factors, all represent appealing perspectives to model the complex and dynamic interactions underlying MDD.
This study has limitations, among which the cohort size. With around 170 individuals, and despite a significant boost from multiomic integration, we achieved an arguably modest prediction performance that calls for efforts to test its generalizability to larger replication cohorts, ideally characterized across a broader panel of omic layers. Another limitation stems from biases associated with the imputation of missing covariate data, and multiomic integration restricted to participants with complete omic blocks. Also, while we rigorously analysed potential confounding variables and adjusted for factors such as sex, age, BMI, and blood cell composition, a more in-depth statistical analysis of interactions between MDD and sex is warranted, as well as consideration of additional factors. Among these, although prior studies have highlighted an influence of smoking, only a marginal effect was detected in our cohort. Similarly, while psychotropic medications likely affect blood molecular measures, they were not considered due to their heterogeneity across our naturalistic cohort, and sample size limitations. Therefore, we cannot rule out the possibility that hidden effects from some of these factors may contribute to part of the associations reported in the present work. Finally, at a methodological level, fine tuning each jDR method (percentage of selected features, number of dimensions, method-specific parameters), or the use of nested CV, may improve prediction performance.
In conclusion, the present study provides evidence that implementing sex-specific and multiomic strategies will be instrumental in developing clinically useful biomarkers of MDD.
Contributors
AM and AG carried out the statistical analyses and performed multiomic integration. ECI managed the storage of blood samples with associated clinical and epidemiological data and extracted total RNA and genomic DNA. HV, GC, BL and ECI conceived microRNA sequencing experiments, prepared libraries and contributed to data analyses. CD contributed to RNA sequencing analyses. DC performed analyses of DNA methylation results. MD contributed to SNF analyses. CCB and MD verified the code and aided for the visualisation of the results for the manuscript. ECI, IY, CMC, BE, RB, ADD and PEL contributed funding and resources. AM, ADD and PEL wrote the manuscript. AM, AG, CCB, ADD and PEL have accessed and verified the underlying data. All authors read and approved the final version of the manuscript.
Data sharing statement
Raw and processed data are publicly available via the Gene Expression Omnibus with accession GSE251786 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE251786). Code for most of the routines, as well as for RRHO2 parallelisation and multiomic advanced integration, is available on GitHub: https://github.com/INSERM-U1141-Neurodiderot/multiomics_MDD.
Declaration of interests
BE received grants from ‘Agence Nationale de la Recherche (ANR)’ and consulting fees from Sanofi Winthrop. All other authors declare no competing interests.
Acknowledgements
RNA-Sequencing was performed by the GenomEast platform (Illkirch, France), member of the ‘France Génomique’ consortium (ANR-10-INBS-0009), and the ICM (Institut du Cerveau et de la Moelle Épinière, Paris, France). Small-RNA sequencing was performed by the TGML platform (Marseille, France), supported by grants from INSERM, GIS IBiSA, Aix-Marseille Université, and ANR-10-INBS-0009-10. DNA methylation arrays were performed by Diagenode (Liège, Belgium). The authors would also like to acknowledge the CAIUS High Performance Computing Center of the University of Strasbourg for providing scientific support and access to computing resources. Part of the computing resources were funded by the Equipex Equip@Meso project (Programme Investissements d'Avenir) and the CPER Alsacalcul/Big Data. The authors also thank Alba Caparros-Roissard for help with data management.
Footnotes
Supplementary data related to this article can be found at https://doi.org/10.1016/j.ebiom.2025.105569.
Contributor Information
Andrée Delahaye-Duriez, Email: andree.delahaye@inserm.fr.
Pierre-Eric Lutz, Email: pierre-eric.lutz@cnrs.fr.
Appendix A. Supplementary data
References
- 1.Hasin D.S., Sarvet A.L., Meyers J.L., et al. Epidemiology of adult DSM-5 major depressive disorder and its specifiers in the United States. JAMA Psychiatr. 2018;75:336–346. doi: 10.1001/jamapsychiatry.2017.4602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tam J., Mezuk B., Zivin K., Meza R.U.S. Simulation of lifetime major depressive episode prevalence and recall error. Am J Prev Med. 2020;59(2):e39–e47. doi: 10.1016/j.amepre.2020.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gutiérrez-Rojas L., Porras-Segovia A., Dunne H., Andrade-González N., Cervilla J.A. Prevalence and correlates of major depressive disorder: a systematic review. Rev Bras Psiquiatr Sao Paulo Braz 1999. 2020;42:657–672. doi: 10.1590/1516-4446-2019-0650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Albert P.R. Why is depression more prevalent in women? J Psychiatry Neurosci. 2015;40:219–221. doi: 10.1503/jpn.150205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Marx W., Penninx B.W.J.H., Solmi M., et al. Major depressive disorder. Nat Rev Dis Primer. 2023;9:44. doi: 10.1038/s41572-023-00454-1. [DOI] [PubMed] [Google Scholar]
- 6.Jaffe D.H., Rive B., Denee T.R. The humanistic and economic burden of treatment-resistant depression in Europe: a cross-sectional study. BMC Psychiatry. 2019;19:247. doi: 10.1186/s12888-019-2222-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mariani N., Cattane N., Pariante C., Cattaneo A. Gene expression studies in depression development and treatment: an overview of the underlying molecular mechanisms and biological processes to identify biomarkers. Transl Psychiatry. 2021;11:354. doi: 10.1038/s41398-021-01469-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fries G.R., Saldana V.A., Finnstein J., Rein T. Molecular pathways of major depressive disorder converge on the synapse. Mol Psychiatry. 2023;28:284–297. doi: 10.1038/s41380-022-01806-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Seney M.L., Glausier J., Sibille E. Large-scale transcriptomics studies provide insight into sex differences in depression. Biol Psychiatry. 2022;91:14–24. doi: 10.1016/j.biopsych.2020.12.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Parel S.T., Peña C.J. Genome-wide signatures of early-life stress: influence of sex. Biol Psychiatry. 2022;91:36–42. doi: 10.1016/j.biopsych.2020.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Khramtsova E.A., Wilson M.A., Martin J., et al. Quality control and analytic best practices for testing genetic models of sex differences in large populations. Cell. 2023;186:2044–2061. doi: 10.1016/j.cell.2023.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sullivan P.F., Neale M.C., Kendler K.S. Genetic epidemiology of major depression: review and meta-analysis. Am J Psychiatry. 2000;157:1552–1562. doi: 10.1176/appi.ajp.157.10.1552. [DOI] [PubMed] [Google Scholar]
- 13.Penner-Goeke S., Binder E.B. Epigenetics and depression. Dialogues Clin Neurosci. 2019;21:397–405. doi: 10.31887/DCNS.2019.21.4/ebinder. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rappoport N., Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2019;47:1044. doi: 10.1093/nar/gky1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li C.X., Wheelock C.E., Sköld C.M., Wheelock Å.M. Integration of multi-omics datasets enables molecular classification of COPD. Eur Respir J. 2018;51 doi: 10.1183/13993003.01930-2017. [DOI] [PubMed] [Google Scholar]
- 16.Bhak Y., Jeong H.O., Cho Y.S., et al. Depression and suicide risk prediction models using blood-derived multi-omics data. Transl Psychiatry. 2019;9:262. doi: 10.1038/s41398-019-0595-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Montano C.M., Irizarry R.A., Kaufmann W.E., et al. Measuring cell-type specific differential methylation in human brain tissue. Genome Biol. 2013;14:R94. doi: 10.1186/gb-2013-14-8-r94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cantini L., Zakeri P., Hernandez C., et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun. 2021;12:124. doi: 10.1038/s41467-020-20430-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang B., Mezlini A.M., Demir F., et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11:333–337. doi: 10.1038/nmeth.2810. [DOI] [PubMed] [Google Scholar]
- 20.Consoloni J.-L., Ibrahim E.C., Lefebvre M.-N., et al. Serotonin transporter gene expression predicts the worsening of suicidal ideation and suicide attempts along a long-term follow-up of a Major Depressive Episode. Eur Neuropsychopharmacol. 2018;28:401–414. doi: 10.1016/j.euroneuro.2017.12.015. [DOI] [PubMed] [Google Scholar]
- 21.Bernstein D.P., Fink L., Handelsman L., et al. Initial reliability and validity of a new retrospective measure of child abuse and neglect. Am J Psychiatry. 1994;151:1132–1136. doi: 10.1176/ajp.151.8.1132. [DOI] [PubMed] [Google Scholar]
- 22.Tian Y., Morris T.J., Webster A.P., et al. ChAMP: updated methylation analysis pipeline for Illumina BeadChips. Bioinformatics. 2017;33:3982–3984. doi: 10.1093/bioinformatics/btx513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bushnell B. 2014. BBMap: a fast, accurate, splice-aware aligner.https://www.osti.gov/biblio/1241166 [Google Scholar]
- 24.Dobin A., Davis C.A., Schlesinger F., et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Anders S., Pyl P.T., Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhao S., Gordon W., Du S., et al. QuickMIRSeq: a pipeline for quick and accurate quantification of both known miRNAs and isomiRs by jointly processing multiple samples from microRNA sequencing. BMC Bioinformatics. 2017;18:180. doi: 10.1186/s12859-017-1601-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kozomara A., Birgaoanu M., Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 2019;47:D155–D162. doi: 10.1093/nar/gky1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hsu S.-D., Tseng Y.-T., Shrestha S., et al. miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 2014;42:D78–D85. doi: 10.1093/nar/gkt1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hoffman G.E., Schadt E.E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinformatics. 2016;17:483. doi: 10.1186/s12859-016-1323-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Houseman E.A., Accomando W.P., Koestler D.C., et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86. doi: 10.1186/1471-2105-13-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhou W., Laird P.W., Shen H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017;45 doi: 10.1093/nar/gkw967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Leek J.T., Johnson W.E., Parker H.S., Jaffe A.E., Storey J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Maksimovic J., Oshlack A., Phipson B. Gene set enrichment analysis for genome-wide DNA methylation data. Genome Biol. 2021;22:173. doi: 10.1186/s13059-021-02388-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Subramanian A., Tamayo P., Mootha V.K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Korotkevich G., Sukhov V., Budin N., Shpak B., Artyomov M.N., Sergushichev A. Fast gene set enrichment analysis. bioRxiv. 2016 doi: 10.1101/060012. [DOI] [Google Scholar]
- 37.Cahill K.M., Huo Z., Tseng G.C., Logan R.W., Seney M.L. Improved identification of concordant and discordant gene expression signatures using an updated rank-rank hypergeometric overlap approach. Sci Rep. 2018;8:9588. doi: 10.1038/s41598-018-27903-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lutz P.-E., Chay M.-A., Pacis A., et al. Non-CG methylation and multiple histone profiles associate child abuse with immune and small GTPase dysregulation. Nat Commun. 2021;12:1132. doi: 10.1038/s41467-021-21365-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Becker L.J., Fillinger C., Waegaert R., et al. The basolateral amygdala-anterior cingulate pathway contributes to depression-like behaviors and comorbidity with chronic pain behaviors in male mice. Nat Commun. 2023;14:2198. doi: 10.1038/s41467-023-37878-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Abbassi-Daloii T., Kan H.E., Raz V., ’t Hoen P.A.C. Recommendations for the analysis of gene expression data to identify intrinsic differences between similar tissues. Genomics. 2020;112:3157–3165. doi: 10.1016/j.ygeno.2020.05.026. [DOI] [PubMed] [Google Scholar]
- 42.de Leeuw C.A., Mooij J.M., Heskes T., Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11 doi: 10.1371/journal.pcbi.1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Shungin D., Winkler T.W., Croteau-Chonka D.C., et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–196. doi: 10.1038/nature14132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dalvie S., Maihofer A.X., Coleman J.R.I., et al. Genomic influences on self-reported childhood maltreatment. Transl Psychiatry. 2020;10:38. doi: 10.1038/s41398-020-0706-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lam M., Chen C.-Y., Li Z., et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat Genet. 2019;51:1670–1678. doi: 10.1038/s41588-019-0512-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Stahl E.A., Breen G., Forstner A.J., et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat Genet. 2019;51:793–803. doi: 10.1038/s41588-019-0397-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wray N.R., Ripke S., Mattheisen M., et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–681. doi: 10.1038/s41588-018-0090-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Fortin J.-P., Cullen N., Sheline Y.I., et al. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage. 2018;167:104–120. doi: 10.1016/j.neuroimage.2017.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Neurocombat-sklearn PyPI. https://pypi.org/project/neurocombat-sklearn/0.1.2a0/
- 50.Gloaguen A., Philippe C., Frouin V., et al. Multiway generalized canonical correlation analysis. Biostatistics. 2022;23(1):240–256. doi: 10.1093/biostatistics/kxaa010. [DOI] [PubMed] [Google Scholar]
- 51.Efron B. Better bootstrap confidence intervals. J Am Stat Assoc. 1987;82:171–185. [Google Scholar]
- 52.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol. 1995;57:289–300. [Google Scholar]
- 53.Kang J., Lienhard M., Pastor W.A., et al. Simultaneous deletion of the methylcytosine oxidases Tet1 and Tet3 increases transcriptome variability in early embryogenesis. Proc Natl Acad Sci U A. 2015;112:E4236–E4245. doi: 10.1073/pnas.1510510112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Tao Y., Zhang H., Jin M., et al. Co-expression network of mRNA and DNA methylation in first-episode and drug-naive adolescents with major depressive disorder. Front Psychiatry. 2023;14 doi: 10.3389/fpsyt.2023.1065417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Dwivedi Y. microRNA-124: a putative therapeutic target and biomarker for major depression. Expert Opin Ther Targets. 2017;21:653–656. doi: 10.1080/14728222.2017.1328501. [DOI] [PubMed] [Google Scholar]
- 56.van den Berg M.M.J., Krauskopf J., Ramaekers J.G., Kleinjans J.C.S., Prickaerts J., Briedé J.J. Circulating microRNAs as potential biomarkers for psychiatric and neurodegenerative disorders. Prog Neurobiol. 2020;185 doi: 10.1016/j.pneurobio.2019.101732. [DOI] [PubMed] [Google Scholar]
- 57.Wittenberg G.M., Greene J., Vértes P.E., Drevets W.C., Bullmore E.T. Major depressive disorder is associated with differential expression of innate immune and neutrophil-related gene networks in peripheral blood: a quantitative review of whole-genome transcriptional data from case-control studies. Biol Psychiatry. 2020;88:625–637. doi: 10.1016/j.biopsych.2020.05.006. [DOI] [PubMed] [Google Scholar]
- 58.Labonte B., Engmann O., Purushothaman I., et al. Sex-specific transcriptional signatures in human depression. Nat Med. 2017;23:1102–1111. doi: 10.1038/nm.4386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Seney M.L., Huo Z., Cahill K., et al. Opposite molecular signatures of depression in men and women. Biol Psychiatry. 2018;84:18–27. doi: 10.1016/j.biopsych.2018.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Gandal M.J., Zhang P., Hadjimichael E., et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science. 2018;362 doi: 10.1126/science.aat8127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.GTEx Consortium The genotype-tissue expression (GTEx) project. Nat Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Krebs C.E., Ori A.P.S., Vreeker A., et al. Whole blood transcriptome analysis in bipolar disorder reveals strong lithium effect. Psychol Med. 2020;50:2575–2586. doi: 10.1017/S0033291719002745. [DOI] [PubMed] [Google Scholar]
- 63.Hartman R.J.G., Mokry M., Pasterkamp G., den Ruijter H.M. Sex-dependent gene co-expression in the human body. Sci Rep. 2021;11 doi: 10.1038/s41598-021-98059-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Clark S.L., Hattab M.W., Chan R.F., et al. A methylation study of long-term depression risk. Mol Psychiatry. 2020;25:1334–1343. doi: 10.1038/s41380-019-0516-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Li Q.S., Morrison R.L., Turecki G., Drevets W.C. Meta-analysis of epigenome-wide association studies of major depressive disorder. Sci Rep. 2022;12 doi: 10.1038/s41598-022-22744-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Li M., Li Y., Qin H., et al. Genome-wide DNA methylation analysis of peripheral blood cells derived from patients with first-episode schizophrenia in the Chinese Han population. Mol Psychiatry. 2021;26:4475–4485. doi: 10.1038/s41380-020-00968-0. [DOI] [PubMed] [Google Scholar]
- 67.Roy B., Dunbar M., Shelton R.C., Dwivedi Y. Identification of MicroRNA-124-3p as a putative epigenetic signature of major depressive disorder. Neuropsychopharmacol. 2017;42(4):864–875. doi: 10.1038/npp.2016.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Musazzi L., Mingardi J., Ieraci A., Barbon A., Popoli M. Stress, microRNAs, and stress-related psychiatric disorders: an overview. Mol Psychiatry. 2023;28:4977–4994. doi: 10.1038/s41380-023-02139-3. [DOI] [PubMed] [Google Scholar]
- 69.Seifert J., Führmann F., Reinhard M.A., et al. Sex differences in pharmacological treatment of major depressive disorder: results from the AMSP pharmacovigilance program from 2001 to 2017. J Neural Transm Vienna Austria 1996. 2021;128:827–843. doi: 10.1007/s00702-021-02349-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.LeGates T.A., Kvarta M.D., Thompson S.M. Sex differences in antidepressant efficacy. Neuropsychopharmacol. 2019;44:140–154. doi: 10.1038/s41386-018-0156-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Moderie C., Nuñez N., Fielding A., Comai S., Gobbi G. Sex differences in responses to antidepressant augmentations in treatment-resistant depression. Int J Neuropsychopharmacol. 2022;25:479–488. doi: 10.1093/ijnp/pyac017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Eid R.S., Gobinath A.R., Galea L.A.M. Sex differences in depression: insights from clinical and preclinical studies. Prog Neurobiol. 2019;176:86–102. doi: 10.1016/j.pneurobio.2019.01.006. [DOI] [PubMed] [Google Scholar]
- 73.Ramsey J.M., Cooper J.D., Bot M., et al. Sex differences in serum markers of major depressive disorder in The Netherlands study of depression and anxiety (NESDA) PLoS One. 2016;11 doi: 10.1371/journal.pone.0156624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Ernst M., Brähler E., Otten D., et al. Inflammation predicts new onset of depression in men, but not in women within a prospective, representative community cohort. Sci Rep. 2021;11:2271. doi: 10.1038/s41598-021-81927-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Slavich G.M., Sacher J. Stress, sex hormones, inflammation, and major depressive disorder: extending Social Signal Transduction Theory of Depression to account for sex differences in mood disorders. Psychopharmacology (Berl) 2019;236:3063–3079. doi: 10.1007/s00213-019-05326-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kokras N., Hodes G.E., Bangasser D.A., Dalla C. Sex differences in the hypothalamic-pituitary-adrenal axis: an obstacle to antidepressant drug development? Br J Pharmacol. 2019;176:4090–4106. doi: 10.1111/bph.14710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Handa R.J., Weiser M.J. Gonadal steroid hormones and the hypothalamo-pituitary-adrenal axis. Front Neuroendocrinol. 2014;35:197–220. doi: 10.1016/j.yfrne.2013.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Lopes-Ramos C.M., Chen C.Y., Kuijjer M.L., et al. Sex differences in gene expression and regulatory networks across 29 human tissues. Cell Rep. 2020;31 doi: 10.1016/j.celrep.2020.107795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Tenenhaus A., Philippe C., Guillemot V., Le Cao K.A., Grill J., Frouin V. Variable selection for generalized canonical correlation analysis. Biostatistics. 2014;15:569–583. doi: 10.1093/biostatistics/kxu001. [DOI] [PubMed] [Google Scholar]
- 80.Chegraoui H., Guillemot V., Rebei A., et al. Integrating multiomics and prior knowledge: a study of the Graphnet penalty impact. Bioinformatics. 2023;39 doi: 10.1093/bioinformatics/btad454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Lemsara A., Ouadfel S., Fröhlich H. PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC Bioinformatics. 2020;21:146. doi: 10.1186/s12859-020-3465-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Franco E.F., Rana P., Cruz A., et al. Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data. Cancers. 2021;13(9):2013. doi: 10.3390/cancers13092013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.