Figure 1.
The DLRAPom pipeline for identifying targetable disease-related lncRNA–miRNA–mRNA regulatory axes by machine learning guided integrative multiomics analysis. The pipeline consists of four steps, namely selecting hub biomarkers by conventional bioinformatics analysis, discovering the most essential protein-coding biomarkers by a novel machine learning model, extracting the key lncRNA–miRNA–mRNA axis, and validating experimentally. In Step1, hub biomarkers are selected by conventional bioinformatics analyses. These results are used as the inputs of Step2 to discover essential protein-coding biomarkers and obtain the importance of each protein-coding biomarker. In Step3, the competing endogenous network is constructed based on the obtained information of lncRNA–miRNA and miRNA–target gene by databases. Among all the constructed regulatory axes, the regulatory axes containing the predicted risk protein-coding biomarkers in the novel Optimized XGBoost model are selected as the main outcomes of our pipeline and would be used for subsequent experimental verification. If there are multiple regulatory axes, the criticality of the regulatory axes is ranked in descending order according to the importance of the predicted protein-coding biomarker included in each axis. After the significant expression change of each RNA molecule in the predicted regulatory axes is confirmed, further supportive evidence for the pairwise targeting relationships within the predicted regulatory axes was required. If these targeting relationships have not been reported before, we need to determine whether these targeting relationships exist through the dual-luciferase reporter assay. For a predicted regulatory axis, only when the biological targeting relationships of lncRNA–miRNA and miRNA–mRNA have been both experimentally verified, can the predicted regulatory axis be considered to be targetable and reliable.