Abstract
Immune-mediated inflammatory diseases (IMIDs) comprise a complex group of pathologies with diverse etiologies and clinical manifestations. In particular, omics technologies have remodeled our understanding of a set of IMIDs such as systemic autoimmune rheumatic diseases (SARDs), generating vast amounts of data on the genome, epigenome, transcriptome, proteome and metabolome of immune cells and SARDs patients. However, the integration of omics data to advance our knowledge of these diseases is challenging, requiring advanced bioinformatic tools. This review explores different multi-omic integrative tools for refining previous research, exploring the biological relevance of datasets within different contexts, or translating omics results into clinical advances. We also discuss relevant multi-omic studies in SARDs research and the potential of omics data from available repositories to complement ongoing investigation in this field.
Keywords: Omics, Integration, Bioinformatics, Immune-mediated inflammatory diseases
Graphical Abstract

1. Introduction
Immune-mediated inflammatory diseases (IMIDs) are characterized by an inappropriate or excessive immune response that results in pathogenic conditions. Clinically, these conditions are complex diseases characterized by heterogeneous manifestations, with many individuals initially experiencing vague systemic symptoms due to immune dysregulation, which persist throughout the course of the disease [1]. These conditions are often challenging to treat and are typically managed through long-term continuous anti-inflammatory therapy. The persistent immune imbalance also makes IMIDs patients prone to co-morbidities. This group encompasses a diverse range of diseases, including more prevalent conditions such as type 1 diabetes (T1D), inflammatory bowel disease (IBD) or rheumatoid arthritis (RA) as well as less common conditions like systemic lupus erythematosus (SLE) or systemic sclerosis (SSc) [2]. It is worth noting that, in the present review, our primary emphasis will be on omics studies related to systemic autoimmune rheumatic diseases (SARDs).
The study of molecular pathways in diseases has undergone a significant paradigm shift in recent years. For instance, historically in genetic studies, researchers primarily focused on studying candidate genes, specific genes suspected to be linked with particular diseases [3]. Although this approach yielded notable discoveries, it frequently disregarded the intricate interplay of various genetic factors and their interactions with environmental influences [4]. With the advancement of technology and the subsequent decrease in costs, scientists have acquired the capability to analyze entire genomes, providing a more comprehensive view of disease pathogenesis [5], [6]. This shift towards omics research has initiated an era of unparalleled understanding and exploration in the field of biomedicine.
Omics studies have played a pivotal role in developing precise and efficient techniques for investigating autoimmune diseases. In an omics approach, a specific layer of biological information is obtained and studied without prior filtering or data targeting. This methodology aims to encompass as much information as possible, avoiding specific associations and hypotheses. Omic sciences are named after the biological layers they cover: genetics, epigenetics, transcription levels, proteins, or metabolites, leading to genomics, epigenomics, transcriptomics, proteomics, and metabolomics, respectively. These approaches provide a comprehensive understanding of the systemic mechanisms involved in the development and prognosis of these diseases. Given the intricate nature of SARDs, both at the molecular and clinical level, these state-of-the-art methodologies are essential to advance our understanding of the pathogenic pathways underlying these diseases [7].
The high heterogeneity and multifactorial character of the molecular mechanisms in IMIDs makes the correct diagnosis and application of treatments to patients a complex and time-consuming process. In patients with SARDs, it has been observed that millions of people take medications on a daily basis which, although they appear to be beneficial, may be ineffective or even trigger serious complications, aggravating their previous condition [8]. The intricacy of IMIDs also affects diagnosis, which is mainly based on clinical manifestations, serum autoantibodies and pathological findings. It is common for patients who meet similar criteria to have different signaling pathways underlying the disease, leading to misdiagnosis [1]. Personalized medicine aims to recognize the environmental and immunological background corresponding to the particularity of each patient, in order to find immunological biomarkers that facilitate correct diagnosis, so as to employ more effective treatments with fewer side effects [9]. The integration of omics and multi-omics approaches is fundamental for enhancing our understanding of disease pathogenesis and translating this knowledge into improved clinical practices [10], [11].
Through the combination of data derived from single-omics studies, a more intricate and realistic portrayal of SARDs can be attained, surpassing the inherent limitations associated with specific omics techniques. In Fig. 1, we have outlined various objectives attainable through the integration of omics data, a notable portion of which pertains to emerging approaches in SARDs research. A diverse array of bioinformatics tools exists for this purpose. In this review, we have categorized relevant tools in the field, encompassing widely recognized tools utilized in SARDs research as well as innovative tools displaying significant potential. By incorporating this approach into the established autoimmunity research standards, we may be able to improve our understanding of disease pathogenesis and molecular pathways, and contribute to the development of better treatments and preventive strategies for this group of diseases.
Fig. 1.
Different layers of omics data and their potential application to SARDs research.
2. Designing a multi-omic integrative study
The adoption of multi-omic strategies has spurred the creation of diverse tools, techniques, and platforms for analyzing, visualizing, and interpreting multi-omics data. The complexities associated with omics integration give rise to numerous technical and analytical challenges. Hence, it is crucial to engage in meticulous and strategic planning to successfully conduct integrative studies. We illustrated the general workflow of a multi-omic study in Fig. 2, including pertinent aspects to cover at each stage.
Fig. 2.
General stages of a multi-omic integration project and relevant verifications for each stage.
The initial challenge in an integrative analysis is designing experiments accurately. Some methods work with paired data, where all omics information derives from the same individual, biological section, and time collection. However, other tools can handle sparse datasets with limited overlap between omics layers. The choice of experimental design depends on the research scope. Paired data is preferred for a holistic understanding of systems biology, while unpaired data is more suitable for comparative and meta-analysis of omics data from various samples or measurements [12], [13].
Data heterogeneity is another crucial aspect to consider. Diverse experimental platforms and technologies cause computational complexity, analysis biases, and challenges in achieving robust, integrative, and reproducible analyses [14]. Fortunately, some tools have proven capable of overcoming this challenge. Efforts have focused on addressing multicollinearity [15], reducing data dimensionality [16], and solving integration problems that arise when dealing with multi-omics and non-omics data [17]. Specific preprocessing steps for individual datasets may not necessarily be helpful, particularly when trying to integrate them into a unified framework. This step remains a challenging aspect that requires ongoing work.
Another factor of great importance is distinguishing experimental noise from biological variability in multidimensional data sets. After initial data processing and normalization, the data are represented in order to detect outliers, technical sources of variability (commonly referred as technical batch), and obvious biological patterns at each level of analysis. Linear and nonlinear k-means, density or graph-based clustering methods are used to accomplish this task, followed by the generation of visual representations using dimensionality reduction techniques. This approach is often challenging due to the complex analytic space and variability in multi-omics data relationships [18], [19].
Although multi-omics analysis is typically not computationally demanding, it can escalate in complexity due to extensive extract, transform, and load (ETL) operations when managing data from large-scale consortia. However, recently developed supervised multi-omic software can be a significant computational overload, due to the exhaustive feature-level analysis involved. To mitigate these computational bottlenecks, local high-performance computing (HPC) resources or cloud-based infrastructure can be leveraged. In that sense, information processing environments such as Apache Hadoop are deployed in research to address challenges related to scalability and accuracy. Furthermore, cloud computing infrastructures like Microsoft Azure, Amazon Web Services or Google Cloud Platform, offer paid access to parallel computing frameworks which optimize processing times by facilitating data processing across multiple machines while enhancing storage efficiency through parallel and in-memory processing. However, cloud repositories raise significant privacy and security concerns, which often face data quality and security issues [20].
Managing voluminous downloaded data files can be effectively addressed through the application of raw data streaming functionality. However, this function is only supported by a limited set of tools. Currently, a large number of multi-omic analysis methods are readily accessible through programming languages like Python or R, enhancing their ease of implementation and adoption [21]. Utilizing packages designed for parallelizing computationally intensive processes is an effective strategy to tackle the most time-consuming steps of multi-omic analyses. Examples include packages such as parallel (R), data.table (R), multidplyr (R), scikit-learn (python) or BiocParallel (Bioconductor).
2.1. Available repositories of omics data
There are numerous resources hosting omics data that can be highly valuable for integration analyses and should be considered when designing a multi-omic study. We compiled the most used and relevant omic repositories that may be used in the investigation of autoimmune diseases in Table 1. However, platforms specifically dedicated to data related to IMIDs or SARDs pathogenic samples are scarce, making it challenging to access relevant data in this field. Thus, it is important to acknowledge repositories such as ADEx (Autoimmune Disease Explorer) [22], where numerous datasets pertaining to actual SARDs omic studies can be downloaded.
Table 1.
Available repositories of omics data.
| Data repository | Link | Ref. | Omics data available | Immune specific data | Data availability |
|---|---|---|---|---|---|
| GTEx | https://gtexportal.org/home/ | [23] | Gene-expression data for 50 + tissues, with expression QTLs and splicing QTLs per tissue. | Spleen and whole blood samples | Available for download |
| ROADMAP | https://www.encodeproject.org | [24] | Epigenomic datasets: Various RNA-seq data, DNase-Seq, ATAC-Seq, TF ChIP-Seq, Histone ChIP-Seq, among others | Immune tissues, extensive data for immune cell types | Available for download, also web-based interface for data exploration |
| Ensembl | http://www.ensembl.org/ | [25] | Genomics level functional information and annotations, Variant Effect Predictor tool, Ensembl Regulatory Build for region annotation | - | Available for download, also web-based interface for single requests |
| SNPNexus | https://www.snp-nexus.org/v4/ | [26] | Web-based tool for genetic variant annotation using data from public repositories | - | Web-based resource, data is not available for download |
| UKBiobank | https://www.ukbiobank.ac.uk/ | [27] | Vast individual-level genomics database with exhaustive clinical data | Numerous IMIDs and SARDs patients’ genomics data available along with pertinent clinical data | Legal and financial requirements for download |
| Finngen Biobank | https://www.finngen.fi/en | [28] | Genomics and clinical data for a substantial number of participants from the Finnish population | Numerous IMIDs and SARDs patients’ genomics data available along with pertinent clinical data | Legal and financial requirements for download |
| ChEMBL | https://www.ebi.ac.uk/chembl/ | [29] | Extensive datasets regarding bioactive molecules, including metabolites and protein interactions | Data concerning immune tissues, cells, and fundamental immune molecules | Available for download |
| Human Cell Atlas | https://data.humancellatlas.org/ | [30] | Community-based repository of single-cell data generated in omics studies; specific data availability varies by study | Single-cell datasets of immune cell types and datasets from IMIDs and SARDs patients regarding different samples | Available for download |
| dbGaP | https://www.ncbi.nlm.nih.gov/gap/ | [31] | Association studies data between genotype and phenotype | Association results from IMIDs and SARDs studies | Data availability varies by study |
| GWAS Catalog | https://www.ebi.ac.uk/gwas/ | [32] | Community-based repository of summary statistics from GWAS | GWAS results from IMIDs and SARDs studies | Data availability varies by study, no requirements for download |
| Omics Discovery Index (OmicsDI) | https://www.omicsdi.org/ | [33] | Genomics, transcriptomics, proteomics, and metabolomics data, interconnected with various single-omics repositories | Omics data from IMIDs and SARDs studies, data from immune tissues and cell types | Data availability varies by study, no requirements for download |
| GEO | https://www.ncbi.nlm.nih.gov/geo/ | [34] | Mainly gene expression datasets from Microarrays and next-generation sequencing, but harbors also other forms of high-throughput functional genomics data submitted by the research community. | Omics data from IMIDs and SARDs studies, data from immune tissues and cell types | Available for download |
| SRA | https://www.ncbi.nlm.nih.gov/sra | [35] | Raw sequencing data and alignment information | Omics data from IMIDs and SARDs studies, data from immune tissues and cell types | Legal requirements for download |
| eQTL, expression quantitative trait loci; GWAS, genome-wide association study; IMID, immune mediated inflammatory disease; TF, transcription factor; SARD, systemic autoimmune rheumatic disease. | |||||
These resources typically contain data from different technologies, such as GWAS summary statistics, microarray data, RNA-Seq data, whole exome sequencing, and ChipSeq, among others, as well as single cell data. They provide a centralized, easily accessible repository for researchers to retrieve omics data, eliminating the need for individual labs to replicate experiments and encouraging data sharing and collaboration. In addition, they often adhere to data standards and formats, ensuring that data are well organized and interoperable. This standardization simplifies data analysis and improves data quality. Moreover, many omics resources are committed to long-term data preservation, ensuring that data remain available for future generations of researchers. This is crucial for the reproducibility of scientific findings. Finally, these repositories often have guidelines and policies in place to ensure the ethical and legal use of the data, protecting the rights and privacy of the individuals who contributed to the datasets.
3. Omics integration tools
In this review, we followed a pragmatic approach in which we categorized omics integration tools into three primary types, according to their objective: 1) refinement tools, those employed to strengthen a pre-existing hypothesis, advancing or validating previous results, 2) contextual tools, designed to facilitate the admixture of data from diverse contexts, enabling exploratory analyses and unveiling new insights, and 3) translation tools, focused on the application of omics results to improve clinical practice. This classification was developed intending to cover relevant tools in the field while keeping in mind the purpose of their application. Table 2 provides a summary of the various tools discussed in this review, including their primary data type and main application.
Table 2.
Overview of the integration tools covered in this review.
| Category | Tool | Primary type of data | Application |
|---|---|---|---|
| Refinement tools |
PAINTOR | GWAS summary statistics | GWAS fine mapping |
| MatrixEQTL | Gene expression or methylation datasets | QTL identification | |
| WASP | RNA-seq datasets | Allele-specific QTL identification | |
| PLASMA | RNA-seq and ChIP-seq datasets | Allele-specific QTL identification | |
| sc-spectrum | Multi-omic single-cell datasets | Single-cell clustering | |
| Contextual tools |
GARFIELD | GWAS summary statistics | Cell enrichment analysis |
| LDSC-SEG | Gene-expression datasets | Cell enrichment analysis | |
| ASSET | GWAS summary statistics | Cross-disease genomic studies | |
| intePareto | RNA-seq and ChIP-seq datasets | Correlation between gene changes and conditions | |
| miARma | mRNA and miRNA datasets | miRNA/mRNA integration | |
| scDRS | GWAS summary statistics and single-cell RNA-seq dataset | Disease-pertinent cells identification | |
| NicheNET | Single-cell or bulk RNA-seq dataset | Cell-cell communication | |
| TwoSampleMR | GWAS summary statistics | Mendelian randomization | |
| Translation tools |
NETTAG | GWAS summary statistics | Drug repurposing |
| AnnoPred | Individual-level genomics data | PRS for diagnosis | |
| G-PROB | Individual-level genomics data | PRS for patient triage | |
| DIABLO | Diverse omics datasets | Biomarker discovery | |
| ONE-Sense | Multimodal single-cell datasets | Biomarker discovery | |
| ASGARD | RNA-seq single-cell | Single-cell-based drug repurposing |
ChIP-seq, chromatin immunoprecipitation sequencing; GWAS, genome-wide association study; miRNA, micro messenger ribonucleic acid; mRNA, messenger ribonucleic acid; PRS, polygenic risk score; QTL, quantitative trait loci; RNA-seq, RNA sequencing.
To show the potential of each type of tool, we featured omics integration examples in each section. In our selection, we have prioritized methodologies that allow the integration of private data with data from available repositories, as well as those that have already been applied to SARDs research.
3.1. Refinement tools
Omics data can be used to further support results in a study by adding releva nt layers of information that reinforce the primary methodology. A canonical example of this approach is the fine mapping analysis of GWAS results. Identifying the actual causal variant of an associated region identified in a genomic study is an inevitable challenge of the GWAS methodology. The linkage disequilibrium (LD) blocks impose limitations in order to identify which precise variant is causing the observed effect. However, in GWAS research, it is now common practice to introduce other omics layers to identify variants known to have functional consequences in the regulation or expression of the candidate genes, facilitating the task of defining the most likely causal variants. In this section, in addition to fine-mapping, we explain various tools that can provide support to ongoing studies, employing an integrative approach to overcome the intrinsic limitations of individual omics.
3.1.1. PAINTOR: GWAS results fine mapping
To address the challenge posed by LD blocks in the analysis of GWAS outcomes, PAINTOR has been developed [36]. PAINTOR serves as a fine-mapping tool, facilitating the incorporation of pertinent data into the variant prioritization process utilizing a Bayesian framework. This aids in assessing the likelihood of the biological causality of different variants within the linked genomic regions associated with a particular trait or condition. The primary result yielded by PAINTOR is a posterior probability (PP) value, which quantifies the probability for each variant to be responsible for the observed association. Subsequently, this information enables the identification of credible sets, the smallest groups of variants that collectively encompass a probability exceeding 95% of containing the causal variant, showing great potential in efficiently reducing the numerous associated genomic variants into concise credible sets. This tool leverages the information contained in functional annotations, as well as the GWAS results. Thus, virtually any data that can serve as functional annotation has the potential to be employed for variant prioritization. This has resulted in PAINTOR being commonly used with general annotations about gene regulation (like histone marks) or expression quantitative trait loci (eQTL) data, which allows the use of data from databases like ROADMAP [24] or GTEx [23]. It is noteworthy that PAINTOR does not intrinsically leverage the relevance of the diverse functional annotations integrated within the analysis. Consequently, it is advisable to maintain a consistency in both the quality and relevance of the employed annotations.
3.1.2. MatrixEQTL: genetic variation-gene expression correlation
MatrixEQTL [37] is a statistical method and software package to identify associations between genetic variation and gene expression levels. It is particularly applied to understanding the genetic basis of complex traits and diseases. Furthermore, it has also been successfully used to study mQTLs (methylation QTLs) and thus relate changes in the methylation of CpG sites, usually in gene regulatory regions, to polymorphic variations [38].
Thus, MatrixEQTL is capable of evaluating the relationship between genotype and gene expression or methylation profiles through linear regression, accommodating both additive and ANOVA genotype effects. These models can incorporate covariates to address factors such as population stratification, gender, and clinical variables. Moreover, it provides support for models involving heteroscedastic and/or correlated errors, false discovery rate estimation, and distinct treatment of local (cis) and distant (trans) eQTLs.
3.1.3. WASP & PLASMA: allele-specific QTLs
Determining the causal variant of the observed effects when identifying quantitative trait loci (QTLs) remains a challenge for this methodology. Leveraging allele-specific (AS) QTLs increases the statistical power of the QTL identification and it constitutes a particularly advantageous tool in scenarios where individual genome-wide genotype data is inaccessible [39]. WASP [40] provides an accessible framework for conducting such analyses, utilizing RNA-seq and ChIP-seq data. This tool features an improved read-mapping pipeline and it corrects excessively dispersed distributions of reads with sample-specific and locus-specific parameters, effectively tackling the previously problematic high false positive rate. Also allowing the use of RNA-seq and ChIP-seq data, PLASMA [41] represents a fine mapping tool to carry out AS analysis. The main advantage of this tool is also addressing the complexity of the LD within the reported loci, developing an improved precision to identify causal variants or credible sets that outperformed several previous tools. Furthermore, PLASMA provides an output that is compatible with other statistical fine-mapping tools, and allows AS analysis in cohorts with relatively low size.
3.1.4. Sc-spectrum: Multimodal single-cell clustering
Integrating multi-omic layers of single-cell data into the single-cell analysis pipeline continues to pose unique challenges. In the last years, different technologies have emerged allowing the simultaneous recollection of multi-omic layers of data in a single-cell sample. For example, collecting immunophenotyping data in addition to RNA profiling. Sc-spectrum [42] is a clustering tool developed for this purpose, by unifying two of the main algorithms previously used: the weighted nearest neighbor (WNN) algorithm and the spectral clustering on multilayer graphs (SCML) algorithm. This novel approach provides the opportunity to contrast the efficacy of both algorithms. Additionally, it is noteworthy that this methodology also facilitates the incorporation of more than two data layers, given their availability for the same dataset.
3.1.5. Applications of refinement tools in SARDs
Refinement approaches are gaining prominence in SARDs research, as they provide the means to delve deeper into the analysis of various datasets without requiring an entirely new experimental design. As an example, these tools could also be applied to promote more precise results before designing in vivo validation experiments of previous omics studies. In genomics, the GWAS pipeline is now commonly accompanied by what is known as “post-GWAS” analyses. These analyses often involve multi-omic approaches mostly aimed at gaining a better understanding of the functional consequences of the associated regions and at clarifying the genetic cause of this mechanism. For instance, for RA [43] and SSc [44], fine mapping allowed the definition of credible sets in most GWAS-associated loci, enabling a more focused and based discussion of the possible implications of these regions. The MatrixEQTL framework has been utilized in the investigation of several IMIDs, such as Sjögren’s syndrome [45], where it was used to identify eQTLs linked to the development of this disease. In the case of RA, this tool has been used to analyze pathways linked to anti-TNF treatment response [46]. Also, it helped in the identification of mQTLs to characterize methylation patterns among RA patients, depending on their response to methotrexate treatment [47].
The allele-specific approach has demonstrated significant promise in unraveling intricate regulatory mechanisms in autoimmunity. For instance, this methodology unveiled a dynamic regulatory process concerning HLA variants and the timing of CD4+ T cell stimulation [48]. These findings have advanced our comprehension of how established SARDs-associated genetic regions may impact immune pathways [49].
3.2. Contextual tools
Another pioneering application of omics data integration involves combining various layers of information to facilitate the exploration of data in a context distinct from its original production, allowing a novel interpretation of the results. In this section, we highlighted tools that, other than giving more plausibility to a previous hypothesis, are designed for exploratory purposes, enabling the formulation of more complex interpretations of the observed patterns of single-omics datasets.
3.2.1. GARFIELD & LDSC-SEG: Cell enrichment GWAS analysis
Many of the disease-associated regions identified through GWAS are situated within regulatory regions. This often makes it challenging to pinpoint the underlying impact on pathogenic pathways and their cellular context. Consequently, cell enrichment analyses are commonly employed as post-GWAS tools to ascertain which cell types are more likely to be affected by the regions that have been identified. These tools facilitate the formulation of pathogenic pathways regarding the implicated loci by exploring their potential consequences within a cell-specific context. GARFIELD [50] is a well-established cell enrichment tool in the field for this purpose. This tool can calculate a general enrichment using publicly available databases by default, complementing the analysis with an exhaustive supplementary visualization of the results. It is important to note that GARFIELD has the capability to calculate an effective number of independent annotations, preventing overcorrection of results due to multiple testing. While this tool can generate comprehensive cell enrichment analyses for general purposes, it also allows the integration of user-provided functional annotations. Another tool, LDSD-SEG [51], serves a similar purpose but uses a distinct approach. LDSC-SEG leverages raw gene-expression data (the most common publicly available type of data) and stratified LD scores to assess enrichments within various cellular contexts. One particularly remarkable feature of this tool is its correction of cell enrichment compared to other annotations within the same organ, rather than a general statistical correction. This methodology has the potential to pinpoint more specific cell types that play a role in the pathogenic processes.
3.2.2. ASSET: Cross-disease genomic meta-analysis
Cross-disease GWAS meta-analysis involves examining the common factors that underlie multiple diseases [52]. This approach is particularly advantageous for enhancing the statistical power of genetic studies focused on less well-understood diseases, as it operates within a framework that enables the exploration of broader pathogenic mechanisms. ASSET [8] is a noteworthy tool proficient in integrating GWAS summary statistics. Its distinctive feature lies in its ability to assess the significance of associations by considering all potential subsets of the data, meaning that it can identify loci that impact either all the studied diseases, just some of them, or even loci that affect various diseases in opposite ways. A number of cross-disease GWAS meta-analyses in autoimmune diseases have used ASSET, identifying the presence of common genetic factors among these pathologies [53], [54], [55], [56]. This methodology has shown considerable potential in elucidating common mechanisms across various diseases, making it a specially interesting candidate for enhancing our understanding of diseases that are under-studied or under-represented.
3.2.3. intePareto: Integration of RNA-Seq and ChIP-Seq data
intePareto [57] is implemented as an R package, offering a user-friendly workflow for the quantitative integration of RNA-Seq and ChIP-Seq data. This tool provides users with essential utilities for RNA-Seq analysis, utilizing primarily kallisto [58], BWA [59] and samtools [60] for processing ChIP-Seq raw data. Subsequently, it aligns and matches intermediate data at the gene level, performs differential expression analysis across different biological conditions, and prioritizes genes exhibiting consistent changes in both RNA-Seq and ChIP-Seq data through Pareto optimization. In practical terms, the Pareto optimal method is employed to identify solutions that are Pareto-optimal in multi-objective problems, where enhancing one variable cannot occur without devaluing at least one other variable. These algorithms aim to pinpoint a collection of solutions that represent an optimal compromise among the objectives, as opposed to a single, optimal solution. Consequently, this tool facilitates the establishment of both positive and negative (active/repressive marks) relationships between expressed genes and their regulatory mechanisms.
3.2.4. miARma: simultaneous study of miRNA and mRNA expression data
miARma-Seq [61] is a tool capable of analyzing RNA omics data, whether they are miRNAs, mRNAs, or even circular RNAs. Starting from raw data, it conducts quality assessments, removes adapters sequences and, if necessary, aligns reads, quantifies them, performs differential expression analysis, and functional enrichment.
Moreover, it is capable of integrating mRNA and miRNA data based on two approaches. The first of them stems from the idea that since miRNAs are RNA molecules that preferentially bind to the 3'-UTR of mRNAs to inhibit their expression, it is possible to seek for negative correlations between these two data sets (overexpressed miRNAs that inhibit mRNAs, as well as inhibited miRNAs that promote mRNA overexpression). The second approach supposes that, as the binding between miRNAs and mRNAs is not random, there are numerous parameters governing the possible binding of these RNAs to each other, such as sequence complementarity, the location of the binding site in the UTR, and the necessary thermodynamics for favored binding. Therefore, there are databases that compile experimentally validated information on the binding of these miRNAs to mRNAs. In addition, there are programs that predict, based on the mentioned parameters, whether a binding between these two RNAs would be possible. Among these databases, we highlight miRGate [62] for containing information from five prediction methods, as well as four databases with experimentally validated targets.
3.2.5. scDRS: Disease-relevance metric in single-cell data
Single-cell disease-relevance score (scDRS) [63] is a tool developed in order to study how different clusters of cells, as defined in a single-cell study, are implicated in the pathogenesis of a disease. From GWAS summary statistics, this tool defines a set of genes relevant for the development of a condition. Subsequently, it analyzes the distribution of cells expressing these relevant genes in comparison to randomly defined sets of genes. This process generates a disease score, enabling the assessment of each cell cluster's implication in the disease. Significantly, this tool can be employed using data exclusively from public repositories or in conjunction with private GWAS or single-cell data.
3.2.6. NicheNET: Interacting cells prediction
NicheNet [64] is an integrative tool that allows the interpretation of gene-expression changes as the effect of interacting cells, highlighting pairs of cells with potential effects on each other. In the last years, tools to investigate cell-cell communication from single-cell datasets have arisen, but the impact of these interactions into gene-expression changes remains largely unexplored. NicheNet uses data from single-cell or bulk transcriptomics and integrates it with repositories of ligand-receptor interactions, allowing for a comprehensive prediction of the ligands that are affecting cell expression and ligands that are affecting other cells, as well as signaling pathways that could be affecting gene-expression. This tool integrates the given data into a weighted gene regulatory network, which has shown notable predictive ability of gene regulatory effects on cell interaction.
3.2.7. TwoSampleMR: Mendelian randomization
Mendelian randomization is an optimal method for studying causal relationships between phenotypes. TwoSampleMR [65] is an R-based tool that uses summary statistics from independent GWAS instead of individual-level genomic data to perform this analysis. This approach can be used to investigate the causal relationship between a particular IMID and a potential risk factor for the disease, such as stroke or obesity, as long as GWAS data are available for both the disease and the risk factor. By increasing our understanding of how a disease is affected by previous exposures or vice versa, these studies hold great potential to change clinical guidelines for treating and preventing diseases.
3.2.8. Applications of contextual tools in SARDs
Omics studies frequently yield intricate patterns that can be challenging to interpret. Contextual tools enable the exploration of data within various contexts, thus facilitating the comprehension of diverse functional consequences depicted in single-omics results. This is relevant not only for new research but also for reevaluating previous studies within a new framework, which has the potential to generate new hypotheses and uncover fresh insights and results. In a multi-omics study of SLE [66], by assessing proteomics, metabolomics and lipidomics data, the authors correlated the activity of the disease with many immune components, such as the number of B cells, acute inflammation proteins, elements from the MAP-kinases and ERK1/2 signaling pathways, among others. Notably, the authors were able to extensively describe how the complement system is implicated in the development of the disease, pinpointing specific components like C1QC, C1R, C4A or C4B to be associated with the disease activity. Moreover, in a giant cell arteritis study [67], the integration of the methylome and transcriptome of CD14 + monocytes allowed the identification of changes in DNA methylation correlated with alterations in gene expression levels. The identified genes were found to play an important role in the pathogenesis of this disease and in the molecular response to glucocorticoid treatment, the most used treatment for this disease.
Omics data was also used in SSc to identify the most crucial immune cell types involved in the pathogenesis of this disease. A study analyzing RNA from skin biopsies of SSc patients revealed the involvement of specific subtypes of macrophages, T cells, and B cells by studying their immune gene expression signatures. The involvement of different cell types correlated with the rate of disease progression, providing a deeper understanding of how immune cell types may affect disease development [68].
Finally, another example of contextual tools is the cross-disease meta-analysis approach. This is a powerful tool for studying the shared genetic component between diseases and it has already been proven particularly fruitful for autoimmune diseases, as they present a high rate of comorbidity [53], [54]. This approach is particularly relevant for the study of lowly prevalent diseases, for which the available statistical power is limited. In a recent vasculitides cross-disease study [56], 16 loci were found to be associated with this group of diseases, where 15 of them were previously unreported for some of the included diseases. Interestingly, 3 loci presented opposite risk effects for some of these conditions, potentially indicating divergent pathogenic pathways.
3.3. Translation tools
As omics studies and technologies continue to advance rapidly, there is an increasing demand to harness this deeper understanding for the improvement of clinical treatments and management strategies. In this section, we categorized tools specifically designed to bridge the gap between omics studies and their practical applications in clinical settings.
3.3.1. NETTAG: Identification of relevant genes for drug repurposing
This tool allows the integration of different layers of omics data, with a great potential for drug repurposing. Using a deep learning framework, NETTAG [69] is designed to identify relevant genes for the development of a disease, showing a promising ability to identify potential treatable targets and genes implicated in regulatory processes. This tool is capable of integrating GWAS summary statistics, multi-omics data and protein-protein interaction networks for this purpose, and it can be coupled into a drug repurposing pipeline to propose potential new treatments for diseases.
3.3.2. AnnoPred & G-PROB: Genetic risk scoring
Polygenic risk scores are a rising research field, as they enable applying the genetic knowledge of diseases to improve disease diagnosis and patient stratification [70], [71]. Multiple PRSs for IMIDs have been developed with notable prediction ability [72], [73], [74]. Nonetheless, their framework can be strengthened by including pertinent information from other omics layers. AnnoPred [75] is a multi-omic tool that assesses both genomic and epigenomic data to perform risk scoring over a population. This approach outperformed the frequently used single-omics PRS strategies. For a different objective, G-PROB [76] is targeted to facilitate patient triage by integrating multiple PRSs. To address the challenge of differential diagnosis between multiple clinically similar diseases, this framework assumes that a diagnosis of one of those diseases is correct. Subsequently, it calculates the probabilities of the patient being affected by each disease, incorporating relevant genetic information to potentially aid in making more informed medical decisions. As clinical databases, like UK Biobank [27] or FinnGen [28], become larger in size and more comprehensive, this type of tool gains increased potential, as this growth enables the development of more accurate, complex and clinically relevant prediction models.
3.3.3. DIABLO: Biomarker discovery
The objective of DIABLO [77] is to improve the identification of relevant biomarkers and molecular signatures of phenotypes. It achieves this by evaluating diverse omics datasets, demonstrating remarkable flexibility in its ability to integrate different types of omics data. Notably, its supervised algorithm is designed to maximize common patterns across various datasets, enabling the analysis of subsets of phenotypic groups. In its experimental testing, DIABLO successfully identified biomarkers such as mRNAs, miRNAs, CpGs, proteins, and metabolites. Its application led to the discovery of both known and previously unidentified biomarkers associated with various pathological conditions. These findings demonstrated consistency with existing tools and underscored DIABLO's potential to uncover novel biomarkers.
3.3.4. ONE-Sense: Multimodal single-cell integration
Current interpretation of multi-omics single cell dataset is achieved by dimensionality reduction to, for instance, refine cell and context specific biomarkers. In recent years, many technologies have emerged that enable simultaneous data collection from individual cells for multiple omic layers. For example, CITE-Seq [78], which allows for transcriptomics and cell-surface proteomics data collection, while AbSeq [79] is designed for the simultaneous identification of transcriptomics and proteomics information. While these technologies represent a challenging and rapidly advancing field, we wish to highlight the significance of tools like ONE-Sense [80], which simplifies the process of comprehending such multimodal datasets through dimensionality reduction. This approach has been modified and successfully applied in the analysis of multimodal single-cell immune data [81]. This technology presents great potential for biomarker identification, as evidenced by their research, which unveiled potential biomarkers for recently activated T cells in blood.
3.3.5. ASGARD: Single-cell-based drug repurposing
Availability of single-cell RNA sequencing datasets is rising, and yet the specificity of this type of omics data has been scarcely reflected in drug repurposing pipelines. ASGARD [82] has been introduced as a tool to assess the responsiveness of single-cell clusters to various drugs, providing a drug score that considers cell cluster composition, expression patterns, and deregulated genes. This tool demonstrates significant promise, outperforming bulk-based tools in terms of performance. Considering the opportunities that recent single-cell datasets offer, this tool shows considerable potential for uncovering novel treatments in IMIDs. Moreover, it could also be used for enhancing previously defined drug repurposing studies, particularly in refining the correlation between the proposed drug and the specific cell types it targets.
3.3.6. Applications of translation tools in SARDs
Applying the insights gained from SARDs research to clinical settings is a critical objective for enhancing the quality of life for patients suffering from these conditions. The clinical translation of omics results encounters various challenges, with one of the most notable being the limited access to clinical histories required for the validation of translation models, often due to ethical and legal constraints. For this reason, the US National Cancer Institute has proposed criteria to evaluate when omics-based predictors can be considered for clinical trials [83]. Furthermore, any omics translation requires further validation studies to assess the safety of the proposed hypotheses. Nevertheless, multi-omics approaches can expand existing knowledge, paving the way for more evidence-based clinical trials and the formulation of novel hypotheses applicable to clinical practice.
One well-established approach to clinical translation is drug repurposing. Unlike the conventional drug approval process, which is both time-consuming and costly, drug repurposing offers an appealing alternative. This method represents a more efficient process, as it involves the use of drugs that have already received approval, known for their safety in previous applications [84]. This approach is being extensively implemented in SARDs research, as the need for more specific treatments for these diseases is significant [85]. For instance, a cross-disease genomic analysis involving various autoimmune disorders uncovered several promising candidates for their treatment [86].
Advances towards personalized medicine can be achieved with integrative tools, like monitoring drug response through multi-omics profiling [87]. In this study, by investigating the proteome, transcriptome and immunophenotype of the patients, they were able to describe the changes that followed the remission of the disease, as well as propose different biomarkers that could serve to monitor the disease. To identify subgroups of primary Sjögren's syndrome patients that could receive differential treatment, a multi-omic profiling approach was developed [88]. This study defined four groups of patients with distinct predominant immune pathways and biomarkers, which could lead to the development of personalized treatment strategies for this disease.
Integrating omics data with clinical and demographic variables poses several challenges. A viable solution to effectively handle this data is machine learning. Unsupervised clustering of patients with various systemic autoimmune diseases effectively categorized them into four distinct clusters, each associated with different immune pathways and levels of disease activity [89]. In the case of patients with SLE, a machine learning algorithm demonstrated the ability to identify 9 distinct molecular signatures associated with specific clinical manifestations, drug responses, or long-term remission. This discovery holds significant potential for predicting these outcomes in patients [90]. Similar promising results have been observed in SSc, where a highly accurate model successfully identified 4 distinct subsets of patients [91].
4. Summary and outlook
The integration of multi-omics is gaining traction in SARDs research. Initial studies have showcased its immense potential. By employing these techniques, we can better understand the intricate processes behind these diseases, enhance our interpretations of previous research, and set the stage for more effective clinical approaches.
Integrative approaches offer a wide range of applications, and in this review, we have organized them into three categories according to their purpose: refinement, contextual and translation tools. Furthermore, integrative approaches can be applied not only to newly collected data, but also to reevaluate previous studies and publicly available data, offering a compelling way to advance research in resource-constrained environments.
Machine learning stands out as a powerful approach for unveiling complex mechanisms. Its potential has begun to be explored in the context of systemic rheumatic diseases research in ambitious research projects where it has yielded notable results. Although the adoption of machine learning in rheumatology has been slower compared to other medical fields, it has a remarkable capacity to elucidate the complex systemic mechanisms underlying immune-related diseases [92].
An important condition to continue the success story of integrated approaches in clinical research and practice depends on increased sharing of high-quality, controlled data in public repositories. This is essential for the development of effective machine learning algorithms, which are fundamental for training and deploying predictive models in clinical settings.
Funding
This work supported by the Red de Investigación Cooperativa Orientada a Resultados en Salud, ISCIII (RICOR, RD21/0002/0039) and Ministerio de Ciencia e Innovación, reference: PID2022–13929208-I00. EA-L was the recipient of a postdoctoral fellowship from the regional Andalusian Government (POSTDOC_20 _00541). LCT-C was also the recipient of a postdoctoral fellowship from the regional Andalusian Government (POSTDOC_21 _00394).
CRediT authorship contribution statement
Conceptualization: GB-Y, LCT-C, MK, EA-L, JM; Investigation: GB-Y, LCT-C, EA-L; Supervision: EA-L, JM; Visualization: GB-Y, LCT-C; Writing - original draft: GB-Y, LCT-C, MK, EA-L; Writing - review & editing: GB-Y, LCT-C, MK, EA-L, JM.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research is part of the doctoral degree awarded by GB-Y, within the Biomedicine program from the University of Granada entitled ‘Caracterización del mapa genético de la arteritis de células gigantes’.
References
- 1.Pisetsky D.S. Pathogenesis of autoimmune disease. Nat Rev Nephrol. 2023;19(8):509–524. doi: 10.1038/s41581-023-00720-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.McInnes I.B., Gravallese E.M. Immune-mediated inflammatory disease therapeutics: past, present and future. Nat Rev Immunol. 2021;21(10):680–686. doi: 10.1038/s41577-021-00603-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ortíz-Fernández L., Martín J., Alarcón-Riquelme M.E. A summary on the genetics of systemic lupus erythematosus, rheumatoid arthritis, systemic sclerosis, and Sjögren’s syndrome. Clin Rev Allergy Immunol. 2022;64(3):392–411. doi: 10.1007/s12016-022-08951-z. [DOI] [PubMed] [Google Scholar]
- 4.Claussnitzer M., Cho J.H., Collins R., Cox N.J., Dermitzakis E.T., Hurles M.E., et al. A brief history of human disease genetics. Nature. 2020;577(7789):179–189. doi: 10.1038/s41586-019-1879-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Abdellaoui A., Yengo L., Verweij K.J.H., Visscher P.M. 15 years of GWAS discovery: realizing the promise. Am J Hum Genet. 2023;110(2):179–194. doi: 10.1016/j.ajhg.2022.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gutierrez-Arcelus M., Rich S.S., Raychaudhuri S. Autoimmune diseases - connecting risk alleles with molecular traits of the immune system. Nat Rev Genet. 2016;17(3):160–174. doi: 10.1038/nrg.2015.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lin C.M.A., Cooles F.A.H., Isaacs J.D. Precision medicine: the precision gap in rheumatic disease. Nat Rev Rheuma. 2022;18(12):725–733. doi: 10.1038/s41584-022-00845-w. [DOI] [PubMed] [Google Scholar]
- 9.Anaya J.M., Duarte-Rey C., Sarmiento-Monroy J.C., Bardey D., Castiblanco J., Rojas-Villarraga A. Personalized medicine. Closing the gap between knowledge and clinical practice. Autoimmun Rev. 2016;15(8):833–842. doi: 10.1016/j.autrev.2016.06.005. [DOI] [PubMed] [Google Scholar]
- 10.Guthridge J.M., Wagner C.A., James J.A. The promise of precision medicine in rheumatology. Nat Med. 2022;28(7):1363–1371. doi: 10.1038/s41591-022-01880-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Karczewski K.J., Snyder M.P. Integrative omics for health and disease. Nat Rev Genet. 2018;19(5):299–310. doi: 10.1038/nrg.2018.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Agamah F.E., Bayjanov J.R., Niehues A., Njoku K.F., Skelton M., Mazandu G.K., et al. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci. 2022;9 doi: 10.3389/fmolb.2022.967205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tarazona S., Balzano-Nogueira L., Gómez-Cabrero D., Schmidt A., Imhof A., Hankemeier T., et al. Harmonization of quality metrics and power calculation in multi-omic studies. Nat Commun. 2020;11(1):3092. doi: 10.1038/s41467-020-16937-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lee D., Park Y., Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform. 2021;22(3):bbaa188. doi: 10.1093/bib/bbaa188. [DOI] [PubMed] [Google Scholar]
- 15.Meng C., Kuster B., Culhane A.C., Gholami A.M. A multivariate approach to the integration of multi-omics datasets. BMC Bioinforma. 2014;15:162. doi: 10.1186/1471-2105-15-162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Meng C., Zeleznik O.A., Thallinger G.G., Kuster B., Gholami A.M., Culhane A.C. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform. 2016;17(4):628–641. doi: 10.1093/bib/bbv108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.López de Maturana E., Alonso L., Alarcón P., Martín-Antoniano I.A., Pineda S., Piorno L., et al. Challenges in the Integration of Omics and Non-Omics Data. Genes. 2019;10(3):238. doi: 10.3390/genes10030238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Way G.P., Zietz M., Rubinetti V., Himmelstein D.S., Greene C.S. Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol. 2020;21(1):109. doi: 10.1186/s13059-020-02021-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liu Z.P. Quantifying gene regulatory relationships with association measures: a comparative study. Front Genet. 2017;8:96. doi: 10.3389/fgene.2017.00096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Koppad S.B.A., Gkoutos G.V., Acharjee A. Cloud computing enabled big multi-omics data analytics. Bioinform Biol Insights. 2021;15 doi: 10.1177/11779322211035921. 11779322211035921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Krassowski M., Das V., Sahu S.K., Misra B.B. State of the field in multi-omics research: from computational needs to data mining and sharing. Front Genet. 2020;11 doi: 10.3389/fgene.2020.610798. 610798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Martorell-Marugán J., López-Domínguez R., García-Moreno A., Toro-Domínguez D., Villatoro-García J.A., Barturen G., et al. A comprehensive database for integrated analysis of omics data in autoimmune diseases. BMC Bioinforma. 2021;22(1):343. doi: 10.1186/s12859-021-04268-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lonsdale J., Thomas J., Salvatore M., et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Consortium R.E., Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Martin F.J., Amode M.R., Aneja A., Austine-Orimoloye O., Azov A.G., Barnes I., et al. Ensembl 2023. Nucleic Acids Res. 2023;51(D1):D933–D941. doi: 10.1093/nar/gkac958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Oscanoa J., Sivapalan L., Gadaleta E., Dayem Ullah A.Z., Lemoine N.R., Chelala C. SNPnexus: a web server for functional annotation of human genome sequence variation (2020 update) Nucleic Acids Res. 2020;48(W1):W185–W192. doi: 10.1093/nar/gkaa420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3) doi: 10.1371/journal.pmed.1001779. e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kurki M.I., Karjalainen J., Palta P., Sipilä T.P., Kristiansson K., Donner K.M., et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023;613(7944):508–518. doi: 10.1038/s41586-022-05473-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gaulton A., Bellis L.J., Patricia Bento A., Chambers J., Davies M., Hersey A., et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(Database issue):D1100. doi: 10.1093/nar/gkr777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Regev A., Teichmann S.A., Lander E.S., Amit I., Benoist C., Birney E., et al. The human cell atlas. Elife. 2017;6 doi: 10.7554/eLife.27041. e27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tryka K.A., Hao L., Sturcke A., Jin Y., Wang Z.Y., Ziyabari L., et al. NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res. 2013;42(D1):D975–D979. doi: 10.1093/nar/gkt1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.MacArthur J., Bowler E., Cerezo M., Gil L., Hall P., Hastings E., et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) Nucleic Acids Res. 2017;45(Database issue):D896. doi: 10.1093/nar/gkw1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Perez-Riverol Y., Bai M., da Veiga Leprevost F., Squizzato S., Park Y.M., Haug K., et al. Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol. 2017;35(5):406–409. doi: 10.1038/nbt.3790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41(Database issue):D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Leinonen R., Sugawara H., Shumway M. The Sequence Read Archive. Nucleic Acids Res. 2011;39(Database issue):D19. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kichaev G., Yang W.Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10(10) doi: 10.1371/journal.pgen.1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28(10):1353. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Casares-Marfil D., Kerick M., Andrés-León E., Bosch-Nicolau P., Molina I., et al. Chagas Genetics CYTED Network, GWAS loci associated with Chagas cardiomyopathy influences DNA methylation levels. PLoS Negl Trop Dis. 2021;15(10) doi: 10.1371/journal.pntd.0009874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sun W. A statistical framework for eQTL mapping using RNA-seq data. Biometrics. 2012;68(1):1–11. doi: 10.1111/j.1541-0420.2011.01654.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Van De Geijn B., McVicker G., Gilad Y., Pritchard J.K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods. 2015;12(11):1061–1063. doi: 10.1038/nmeth.3582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang A.T., Shetty A., O’Connor E., Bell C., Pomerantz M.M., Freedman M.L., et al. Allele-Specific QTL Fine Mapping with PLASMA. Am J Hum Genet. 2020;106(2):170–187. doi: 10.1016/j.ajhg.2019.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhang S., Leistico J.R., Cho R.J., Cheng J.B., Song J.S. Spectral clustering of single-cell multi-omics data on multilayer graphs. Bioinformatics. 2022;38(14):3600–3608. doi: 10.1093/bioinformatics/btac378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Okada Y., Wu D., Trynka G., Raj T., Terao C., Ikari K., et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506(7488):376–381. doi: 10.1038/nature12873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.López-Isac E., Acosta-Herrera M., Kerick M., Assassi S., Satpathy A.T., Granja J., et al. GWAS for systemic sclerosis identifies multiple risk loci and highlights fibrotic and vasculopathy pathways. Nat Commun. 2019;10(1):4955. doi: 10.1038/s41467-019-12760-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lessard C.J., Li H., Adrianto I., Ice J.A., Rasmussen A., Grundahl K.M., et al. Variants at multiple loci implicated in both innate and adaptive immune responses are associated with Sjögren’s syndrome. Nat Genet. 2013;45(11):1284–1292. doi: 10.1038/ng.2792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cherlin S., Lewis M.J., Plant D., Nair N., Goldmann K., Tzanis E., et al. Investigation of genetically regulated gene expression and response to treatment in rheumatoid arthritis highlights an association between IL18RAP expression and treatment response. Ann Rheum Dis. 2020;79(11):1446–1452. doi: 10.1136/annrheumdis-2020-217204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Nair N., Plant D., Verstappen S.M., Isaacs J.D., Morgan A.W., Hyrich K.L., et al. Differential DNA methylation correlates with response to methotrexate in rheumatoid arthritis. Rheumatology. 2020;59(6):1364–1371. doi: 10.1093/rheumatology/kez411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gutierrez-Arcelus M., Baglaenko Y., Arora J., Hannes S., Luo Y., Amariuta T., et al. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat Genet. 2020;52(3):247–253. doi: 10.1038/s41588-020-0579-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ishigaki K., Kochi Y., Yamamoto K. Genetics of human autoimmunity: From genetic information to functional insights. Clin Immunol. 2018;186:9–13. doi: 10.1016/j.clim.2017.08.017. [DOI] [PubMed] [Google Scholar]
- 50.Iotchkova V., Ritchie G.R.S., Geihs M., Morganella S., Min J.L., Walter K., et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat Genet. 2019;51(2):343–353. doi: 10.1038/s41588-018-0322-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Finucane H.K., Reshef Y.A., Anttila V., Slowikowski K., Gusev A., Byrnes A., et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat Genet. 2018;50(4):621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Harroud A., Hafler D.A. Common genetic factors among autoimmune diseases. Science. 2023;380(6644):485–490. doi: 10.1126/science.adg2992. [DOI] [PubMed] [Google Scholar]
- 53.Ellinghaus D., Jostins L., Spain S.L., Cortes A., Bethune J., Han B., et al. Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci. Nat Genet. 2016;48(5):510–518. doi: 10.1038/ng.3528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li Y.R., Li J., Zhao S.D., Bradfield J.P., Mentch F.D., Maggadottir S.M., et al. Meta-analysis of shared genetic architecture across ten pediatric autoimmune diseases. Nat Med. 2015;21(9):1018–1027. doi: 10.1038/nm.3933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Acosta-Herrera M., Kerick M., González-Serna D., Myositis G.C., Scleroderma G.C., Wijmenga C., et al. Genome-wide meta-analysis reveals shared new in systemic seropositive rheumatic diseases. Ann Rheum Dis. 2019;78(3):311–319. doi: 10.1136/annrheumdis-2018-214127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ortiz-Fernández L., Carmona E.G., Kerick M., Lyons P., Carmona F.D., López Mejías R., et al. Identification of new risk loci shared across systemic vasculitides points towards potential target genes for drug repurposing. Ann Rheum Dis. 2023;82(6):837–847. doi: 10.1136/ard-2022-223697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Cao Y., Kitanovski S., Hoffmann D. intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data. BMC Genom. 2020;21(11):1–9. doi: 10.1186/s12864-020-07205-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
- 59.Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2) doi: 10.1093/gigascience/giab008. giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Andrés-León E., Rojas A.M. miARma-Seq, a comprehensive pipeline for the simultaneous study and integration of miRNA and mRNA expression data. Methods. 2019;152:31–40. doi: 10.1016/j.ymeth.2018.09.002. [DOI] [PubMed] [Google Scholar]
- 62.Andrés-León E., González Peña D., Gómez-López G., Pisano D.G. miRGate: a curated database of human, mouse and rat miRNA-mRNA targets. Database. 2015 doi: 10.1093/database/bav035. bav035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhang M.J., Hou K., Dey K.K., Sakaue S., Jagadeesh K.A., Weinand K., et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat Genet. 2022;54(10):1572–1580. doi: 10.1038/s41588-022-01167-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Browaeys R., Saelens W., Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods. 2020;17(2):159–162. doi: 10.1038/s41592-019-0667-5. [DOI] [PubMed] [Google Scholar]
- 65.Hemani G., Zheng J., Elsworth B., Wade K.H., Haberland V., Baird D., et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7 doi: 10.7554/eLife.34408. e34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Huang X., Luu L.D.W., Jia N., Zhu J., Fu J., Xiao F., et al. Multi-platform omics analysis reveals molecular signatures for pathogenesis and activity of systemic lupus erythematosus. Front Immunol. 2022;13 doi: 10.3389/fimmu.2022.833699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Estupiñán-Moreno E., Ortiz-Fernández L., Li T., Hernández-Rodríguez J., Ciudad L., Andrés-León E., et al. Methylome and transcriptome profiling of giant cell arteritis monocytes reveals novel pathways involved in disease pathogenesis and molecular response to glucocorticoids. Ann Rheum Dis. 2022;81(9):1290–1300. doi: 10.1136/annrheumdis-2022-222156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Skaug B., Khanna D., Swindell W.R., Hinchcliff M.E., Frech T.M., Steen V.D., et al. Global skin gene expression analysis of early diffuse cutaneous systemic sclerosis shows a prominent innate and adaptive inflammatory profile. Ann Rheum Dis. 2020;79:1701–1710. doi: 10.1136/annrheumdis-2019-215894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Xu J., Mao C., Hou Y., Luo Y., Binder J.L., Zhou Y., et al. Interpretable deep learning translation of GWAS and multi-omics findings to identify pathobiology and drug repurposing in Alzheimer’s disease. Cell Rep. 2022;41(9) doi: 10.1016/j.celrep.2022.111717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Kullo I.J., Lewis C.M., Inouye M., Martin A.R., Ripatti S., Chatterjee N. Polygenic scores in biomedical research. Nat Rev Genet. 2022;23(9):524–532. doi: 10.1038/s41576-022-00470-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Torkamani A., Wineinger N.E., Topol E.J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19(9):581–590. doi: 10.1038/s41576-018-0018-x. [DOI] [PubMed] [Google Scholar]
- 72.Wang Y.F., Zhang Y., Lin Z., Zhang H., Wang T.Y., Cao Y., et al. Identification of 38 novel loci for systemic lupus erythematosus and genetic heterogeneity between ancestral groups. Nat Commun. 2021;12(1):1–13. doi: 10.1038/s41467-021-21049-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ishigaki K., Sakaue S., Terao C., Luo Y., Sonehara K., Yamaguchi K., et al. Multi-ancestry genome-wide association analyses identify novel genetic mechanisms in rheumatoid arthritis. Nat Genet. 2022;54(11):1640–1651. doi: 10.1038/s41588-022-01213-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Bossini-Castillo L., Villanueva-Martin G., Kerick M., Acosta-Herrera M., López-Isac E., Simeón C.P., et al. Genomic Risk Score impact on susceptibility to systemic sclerosis. Ann Rheum Dis. 2021;80(1):118–127. doi: 10.1136/annrheumdis-2020-218558. [DOI] [PubMed] [Google Scholar]
- 75.Hu Y., Lu Q., Powles R., Yao X., Yang C., Fang F., et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput Biol. 2017;13(6) doi: 10.1371/journal.pcbi.1005589. e1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Knevel R., le Cessie S., Terao C.C., Slowikowski K., Cui J., Huizinga T.W.J., et al. Using genetics to prioritize diagnoses for rheumatology outpatients with inflammatory arthritis. Sci Transl Med. 2020;12(545) doi: 10.1126/scitranslmed.aay1548. eaay1548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Singh A., Shannon C.P., Gautier B., Rohart F., Vacher M., Tebbutt S.J., et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics. 2019;35(17):3055–3062. doi: 10.1093/bioinformatics/bty1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Stoeckius M., Hafemeister C., Stephenson W., Houck-Loomis B., Chattopadhyay P.K., Swerdlow H., et al. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017;14(9):865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Shahi P., Kim S.C., Haliburton J.R., Gartner Z.J., Abate A.R. Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding. Sci Rep. 2017;7:44447. doi: 10.1038/srep44447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Cheng Y., Wong M.T., van der Maaten L., Newell E.W. Categorical analysis of human T cell heterogeneity with one-dimensional soli-expression by nonlinear stochastic embedding. J Immunol. 2016;196(2):924–932. doi: 10.4049/jimmunol.1501928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Mair F., Erickson J.R., Voillet V., Simoni Y., Bi T., Tyznik A.J., et al. A Targeted Multi-omic Analysis Approach Measures Protein Expression and Low-Abundance Transcripts on the Single-Cell Level. Cell Rep. 2020;31(1) doi: 10.1016/j.celrep.2020.03.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.He B., Xiao Y., Liang H., Huang Q., Du Y., Li Y., et al. ASGARD is A Single-cell Guided Pipeline to Aid Repurposing of Drugs. Nat Commun. 2023;14(1):1–14. doi: 10.1038/s41467-023-36637-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.McShane L.M., Cavenagh M.M., Lively T.G., Eberhard D.A., Bigbee W.L., Williams P.M., et al. Criteria for the use of omics-based predictors in clinical trials. Nature. 2013;502(7471):317–320. doi: 10.1038/nature12564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Trajanoska K., Bhérer C., Taliun D., Zhou S., Richards J.B., Mooser V. From target discovery to clinical drug development with human genetics. Nature. 2023;620(7975):737–745. doi: 10.1038/s41586-023-06388-8. [DOI] [PubMed] [Google Scholar]
- 85.Reay W.R., Cairns M.J. Advancing the use of genome-wide association studies for drug repurposing. Nat Rev Genet. 2021;22(10):658–671. doi: 10.1038/s41576-021-00387-z. [DOI] [PubMed] [Google Scholar]
- 86.Márquez A., Kerick M., Zhernakova A., Gutierrez-Achury J., Chen W.M., Onengut-Gumuscu S., et al. Meta-analysis of Immunochip data of four autoimmune diseases reveals novel single-disease and cross-phenotype associations. Genome Med. 2018;10(1):97. doi: 10.1186/s13073-018-0604-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Tasaki S., Suzuki K., Kassai Y., Takeshita M., Murota A., Kondo Y., et al. Multi-omics monitoring of drug response in rheumatoid arthritis in pursuit of molecular remission. Nat Commun. 2018;9(1):2755. doi: 10.1038/s41467-018-05044-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Soret P., Le Dantec C., Desvaux E., Foulquier N., Chassagnol B., Hubert S., et al. A new molecular classification to drive precision treatment strategies in primary Sjögren’s syndrome. Nat Commun. 2021;12(1):1–18. doi: 10.1038/s41467-021-23472-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Barturen G., Babaei S., Català-Moll F., Martínez-Bueno M., Makowska Z., Martorell-Marugán J., et al. Integrative Analysis Reveals a Molecular Stratification of Systemic Autoimmune Diseases. Arthritis Rheumatol. 2021;73(6):1073–1085. doi: 10.1002/art.41610. [DOI] [PubMed] [Google Scholar]
- 90.Toro-Domínguez D., Martorell-Marugán J., Martinez-Bueno M., López-Domínguez R., Carnero-Montoro E., Barturen G., et al. Scoring personalized molecular portraits identify Systemic Lupus Erythematosus subtypes and predict individualized drug responses, symptomatology and disease progression. Brief Bioinform. 2022;23(5) doi: 10.1093/bib/bbac332. bbac332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Franks J.M., Martyanov V., Cai G., Wang Y., Li Z., Wood T.A., et al. A Machine Learning Classifier for Assigning Individual Patients With Systemic Sclerosis to Intrinsic Molecular Subsets. Arthritis Rheumatol (Hoboken, NJ) 2019;71(10):1701–1710. doi: 10.1002/art.40898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Stafford I.S., Kellermann M., Mossotto E., Beattie R.M., MacArthur B.D., Ennis S. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. NPJ Digit Med. 2020;3:30. doi: 10.1038/s41746-020-0229-3. [DOI] [PMC free article] [PubMed] [Google Scholar]


