Abstract
Background
Small interfering RNA (siRNA) is a powerful tool for gene silencing, but its clinical application is limited by instability and potential immunogenicity. While chemical modification is essential to overcome these hurdles, data on chemically modified siRNAs are currently scattered, hindering rational drug design and development.
Results
We developed CMsiRNAdb, a comprehensive database integrating data resources, analytical tools, and efficacy prediction for chemically modified siRNAs. We consolidated 43,153 experimentally validated sequences and silencing efficiency data derived from 90 patents, covering 36 modification types and 13 therapeutic target genes. The database offers multi-dimensional retrieval, visualization, and batch download functions. Furthermore, we developed ModMapper, a Trie tree-based tool for precise identification of modification sites, and integrated the Cm-siRPred model for efficacy evaluation. CMsiRNAdb is freely accessible at https://cellknowledge.com.cn/CMsiRNAdb/.
Conclusion
CMsiRNAdb provides critical data support and analytical tools for the rational design and rapid optimization of siRNA drugs. By standardizing data and offering predictive capabilities, it significantly advances the development of nucleic acid therapeutics.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12859-025-06359-y.
Keywords: SiRNA, Chemical modification, Silencing efficiency, Nucleic acid drugs, Database, Efficacy prediction
Background
Small interfering RNA (siRNA) is a class of double-stranded non-coding RNA molecules, approximately 21–23 nucleotides in length, that has garnered significant attention in gene functional studies and targeted drug development in recent years due to its highly efficient and specific gene silencing capabilities [1, 2]. Unlike traditional small molecule drugs or antibody drugs, siRNA drugs can precisely target and inhibit the post-transcriptional expression of specific mRNAs, offering a unique advantage, particularly for genes that cannot be effectively intervened with by conventional drugs, thereby greatly expanding the scope of potential therapeutic targets [3]. Furthermore, siRNA drug development possesses notable advantages such as a short development cycle and flexible design; the process from sequence screening to cell-level validation can typically be completed within weeks, significantly enhancing the efficiency of drug discovery [4]. Despite the immense promise of siRNA in the field of drug development, natural siRNA molecules face several limitations in vivo. Firstly, natural siRNAs are highly susceptible to degradation by ubiquitous ribonucleases (RNases), resulting in plasma half-lives that are often less than a few hours. Secondly, the characteristic double-stranded structure of siRNAs can readily activate intracellular protein kinase R (PKR) and Toll-like receptors (such as TLR3 and TLR7), thereby triggering strong non-specific immune responses and cytotoxicity. Moreover, the 5’ end “seed region” sequence of siRNAs can lead to off-target effects due to mismatches with non-target mRNAs. These drawbacks collectively severely restrict the clinical translation and application of siRNA drugs [5, 6].
To enhance the in vivo delivery efficiency and stability of siRNA molecules, researchers have developed various nanoparticle delivery systems over the past two decades, including liposomes, polymeric nanoparticles, and lipid-polymer hybrid carriers [7, 8]. These carriers encapsulate siRNAs to shield their negative charge and provide physical protection, significantly prolonging their circulation time in the bloodstream and improving passive or active targeted delivery to specific tissues. However, in clinical practice, the immunogenicity, potential toxicity, and narrow therapeutic window associated with the nanoparticles themselves still limit their widespread application. Compared to nanoparticle strategies, chemical modification technology directly introduces stabilizing groups at the molecular structure level of siRNA, fundamentally improving its pharmacokinetic properties and safety. Specifically, phosphorothioate modification can significantly enhance the resistance of siRNA to nuclease degradation and improve its plasma stability [9]; modifications at the 2’ position of the ribose, such as 2’-O-methyl or 2’-fluoro modifications, can effectively increase molecular thermodynamic stability and target binding affinity [10, 11]. The combined application of multiple chemical modifications has achieved significant success in clinical drug development. For instance, the combination of 2’-O-methyl and phosphorothioate modifications in an siRNA targeting apolipoprotein B (ApoB) achieved effective liver-targeted gene silencing [12]; the FDA-approved siRNA drug Patisiran utilizes both 2’-O-methyl and 2’-fluoro modifications, effectively improving the loading efficiency of the RNA-induced silencing complex (RISC) and significantly reducing off-target effects [13]; another marketed drug, Inclisiran, integrates phosphorothioate, 2’-O-methyl, and 2’-fluoro modifications, achieving long-term, highly efficient, and safe silencing of the PCSK9 gene [14]. These examples highlight the central value of chemical modification technology for the druggability and clinical translation of siRNA drugs.
While chemical modifications have significantly optimized the performance and druggability of siRNA drugs, traditional experimental validation-dependent design strategies often require substantial human and financial resources. This is particularly true in high-throughput screening scenarios, where it can take months and cost tens of thousands of dollars, severely limiting drug development efficiency. In recent years, with the rapid advancement of artificial intelligence, especially learning and deep learning technologies [15–20], computational approaches for designing chemically modified siRNAs are gradually becoming an important direction in the field. Among the early efforts, Dar et al. developed the SMEpred model based on Support Vector Machine (SVM), achieving preliminary prediction of siRNA silencing efficiency by extracting sequence features [21]. Dong et al. employed the Partial Least Squares (PLS) regression method, improving model prediction performance based on siRNA sequence and physicochemical properties [22]. More recently, our research group further proposed Cm-siRPred, a multi-view learning model based on Transformer and Convolutional Neural Networks (CNN), which significantly enhanced the generalization ability and accuracy of predictions [23]. These data-driven methods demonstrate great potential in reducing development costs, shortening experimental cycles, and improving drug design efficiency [24–26]. However, they also highlight the current algorithms’ strong dependence on high-quality, large-scale datasets for training [27].
Currently, public databases dedicated specifically to chemically modified siRNAs are notably scarce. To date, the only relevant database available is siRNAmod, which contains merely around 4,000 chemically modified siRNA entries [28]. Over 90% of these data originate from a single publication, leading to a limited and unevenly distributed dataset. Additionally, predictive models such as that by Dong et al. were constructed using only approximately 300 data points. Such small-scale and imbalanced datasets significantly limit the generalization capabilities of existing algorithms and hinder the exploration of underlying mechanisms that drive variations in siRNA efficacy across different cell/tissue types and genetic backgrounds. Furthermore, our team’s recently developed siRNAEfficacyDB [29] primarily focuses on unmodified siRNAs and similarly lacks data relevant to chemically modified siRNA efficacy. Therefore, the absence of larger, standardized, and balanced datasets has become a critical bottleneck restricting the further optimization and application of current computational methods. To overcome these limitations, we constructed the Chemically Modified siRNA inhibition efficacy database (CMsiRNAdb), which integrates 43,153 experimentally validated chemically modified siRNA sequences along with their corresponding efficacy data sourced from patents (Table 1). The database covers 36 mainstream chemical modification types and targets 13 critical genes. Concurrently, we developed complementary tools: ModMapper for accurate parsing and identification of chemical modification types, and Cm-siRPred for predicting siRNA inhibition efficacy. Together, CMsiRNAdb and its associated tools provide a robust data foundation and technical support for the intelligent design and high-throughput screening of chemically modified siRNA therapeutics, ultimately promoting more efficient, precise, and intelligent drug development in this field.
Table 1.
Comparison between SiRNAmod and CMsiRNAdb
| Dimension | siRNAmod | CMsiRNAdb |
|---|---|---|
| Data volume | 4,894 entries | 43,153 entries |
| Data source | 96 publications | 90 international patents |
| Modification types | 128 types | 36 major therapeutic types |
| Experimental systems | Not summarized | 39 cell lines including hepatic models |
| Target genes | Not summarized | 13 key genes (e.g., PNPLA3, PCSK9) |
| Data fields | ~ 20 basic fields | Fully structured multi-parameter fields |
| Tools | MarvinSketch | ModMapper + Cm-siRPred |
| Search | Basic, structure-based | Quick, exact, batch, BLAST |
| Visualization | Basic browsing | Statistical charts, batch download |
| Architecture | LAMP (Perl/PHP/MySQL) |
3-tier (HTML/PHP/Python/MySQL) |
| Updates | Semi-annual | Regular updates with versioning |
| Scope | Academic research | Research, drug design, AI modeling |
Construction and content
Data collection and processing
All data in this study were drawn solely from publicly accessible patent documents and do not include any patient information or proprietary databases. Importantly, all data and efficacy measurements originate from in vitro cell-based experiments and explicitly do not involve any animal or human studies. We systematically retrieved and curated our primary dataset from 90 open-access patents worldwide that describe chemically modified siRNA interference inhibitors. A rigorous three-step quality control pipeline was applied: (1) removal of entries with invalid patent status, missing inhibition‐rate measurements, incomplete sequence information, or duplicate sequences; (2) standardization of sequence formats and associated metadata, including correction of notation errors; and (3) supplementation of any missing original double-strand details. The final CMsiRNAdb resource comprises both sense and antisense strand sequences (before and after modification), detailed chemical-modification annotations, experimental parameters (target gene, cell line, concentration, treatment duration), efficacy metrics (inhibition rate ± SD where available), and patent metadata (patent ID, authorization status, title). This database is freely available for non-commercial academic research.
ModMapper
To lower the barrier of understanding complex chemically modified sequences for users without a chemistry background, we developed a chemical modification sequence parsing tool, ModMapper (Fig. 1). This tool employs a Trie tree structure, utilizing a longest-prefix-match-first strategy, and integrates an improved regular expression to handle structurally complex notations such as nested parentheses and special characters. Through a dynamic splitting algorithm, the tool can efficiently identify and parse modification elements within most chemically modified sequences.
Fig. 1.
Main architecture of ModMapper
The core parsing function of this tool accepts three input parameters: the original sequence, the chemically modified sequence, and optional rare special modifiers. A preprocessing module automatically cleans redundant markers and abnormal delimiters, thereby enhancing parsing accuracy and robustness. During the parsing process, the tool first utilizes regular expressions to identify complex modification units within nested structures, followed by a greedy matching process based on the Trie tree logic to prioritize capturing the longest matching modifier. To accurately identify the positional information of both forward and backward modifications, a bidirectional merging strategy is implemented, enabling precise localization and annotation of modification sites.
The parsing results are interpreted by referencing an integrated chemical modification mapping dictionary (detailed on the website’s Help.html page) and further validated against the original sequence. Considering potential structural inconsistencies between the modified and original sequences due to the chemical modifications themselves, a sequence similarity threshold of 80% is adopted as the validation criterion. Failure to meet this threshold automatically triggers a re-parsing process.
Ultimately, the tool outputs structured data containing the modification type, original notation, and positional information (sugar, base, phosphate, etc.). This system provides robust technical support for the systematic parsing of chemical modification sites in oligonucleotide drugs, facilitating efficient utilization of the chemically modified siRNA sequence resources provided by this database.
Website architecture
The overall architecture of the database website adopts a three-tier structure: presentation layer, logic layer, and data layer. The presentation layer was developed using HTML, CSS, and JavaScript for the frontend, responsible for providing a clean and intuitive interface and a rich user interaction experience [30]. The logic layer was implemented using PHP, which executes corresponding scripts based on user requests, supporting data retrieval and the realization of various functionalities. AJAX technology is integrated to achieve real-time data responsiveness. PHP collaborates with Python, acting as a bridge between the frontend and the parsing functions and backend models, for the deployment of sequence parsing, online design, and efficacy prediction functionalities of the system. The data layer utilizes MySQL for data management and storage, supporting multiple query methods to meet diverse needs.
Furthermore, the system employs the Smarty template engine to achieve separation of the frontend and backend, facilitating subsequent frontend maintenance and upgrades. The website adopts a responsive layout design, ensuring compatibility across different screen sizes and device types, thereby further enhancing the user browsing experience.
Utility and discussion
Data statistics and analysis
The CMsiRNAdb database currently contains 43,153 data entries, encompassing 15,760 siRNA sequences extracted from 90 patents, and covers 13 target genes, 39 cell lines, and 36 different types of chemical modifications(Fig. 2a; detailed in Additional file 1). Regarding the cell line distribution of the data, Hep3B (29.9%), primary cynomolgus monkey hepatocytes (10.9%), COS7 (9.0%), primary human hepatocytes (8.1%), and Huh7 (7.9%) are the most frequently used experimental systems(Fig. 2b). Hep3B and Huh7 are classic models for viral hepatitis research [31, 32], while primary hepatocytes (human and cynomolgus monkey) are considered the gold standard for liver research due to their intact metabolic enzyme systems [33, 34]. COS7 cells are often used for siRNA delivery studies due to their high transfection efficiency [35]. In terms of the target gene distribution, PNPLA3, HSD17B13, AGT, LPA, INHBE, and PCSK9 are the most frequently targeted genes(Fig. 2c). These genes are involved in lipid metabolism (PNPLA3, PCSK9), fibrosis progression (HSD17B13), and cardiovascular diseases (AGT, LPA), among other pathological processes [36–39], suggesting that current siRNA drug development is primarily focused on the treatment of metabolic and chronic diseases. Concerning the distribution of silencing efficiency, most chemically modified siRNA sequences exhibit 60–90% gene silencing(Fig. 2d), and the data display a skewed distribution.Given that the data originates from 90 different patents, which may employ varied experimental conditions and evaluation standards, we further assessed the data consistency across different sources. We analyzed the inhibition rate distributions of the top 5 patents ranked by data contribution(Additional file 2). The box plot shows observable variability in both the median efficiencies and the interquartile ranges across these patents. For instance, the median inhibition rates range from approximately 35% (CN1151176007A) to 71% (WO2023014765A1). This heterogeneity confirms the potential differences in experimental protocols or standards between patent sources, which is acknowledged as a database limitation in the ‘Discussion’ section. Despite this variability, the data from all major sources cover a wide dynamic range, supporting their value for integrated analysis.
Fig. 2.
Data statistics of CMsiRNAdb. a Overall statistics for CMsiRNAdb. b Distribution of cell lines. c Distribution of target genes. d Distribution of siRNA efficacy. e Position-dependent modification frequency of siRNA
Further analysis of all siRNA modification patterns reveals that the 3’ end of both the sense and antisense strands shows the highest modification frequency(Fig. 2e). This design characteristic may contribute to enhancing nuclease resistance and promoting RISC complex assembly [40]. Notably, the low modification probability at positions 2–3 of the sense strand might reflect an evolutionary selection to avoid interfering with RISC activation [41]. Regarding the types of chemical modifications, 2’-O-methyl, phosphorothioate, and 2’-fluorine modifications are predominant in both the sense and antisense strands (Additional file 3). This combination strategy cleverly balances molecular stability (2’-O-methyl) with binding affinity (phosphorothioate) [42].
Website functionalities
The CMsiRNAdb website offers four core modules to facilitate user access and exploration of database resources, as well as the utilization of analytical tools. These include Search, Browse, the ModMapper tool, and the Cm-siRPred prediction tool. These modules are designed around the efficient retrieval, systematic browsing, and in-depth analysis of chemically modified siRNAs, supporting multi-level and multi-dimensional data manipulation and result acquisition.
The Search module is categorized into four types: Quick Search, Exact Search, Batch Search, and BLAST Search(Fig. 3). The Quick Search function, located at the bottom of the homepage, allows users to directly input modification types (e.g., 2’-O-Methyladenosine, 2’-Fluorouridine) or target factors (e.g., MARC1, CTNNB1) for rapid retrieval, enabling quick localization of desired entries. The Exact Search provides more refined filtering options, supporting the specification of modification types and positions for either the sense or antisense strand, while also allowing for precise queries based on cell type, target gene, and duplex modification location (phosphate group, ribose, or base). The Batch Search module includes three modes: batch search based on modification type, batch search based on target gene, and batch search based on patent ID. Each mode allows for the addition of cell type and duplex modification location criteria to enable large-scale screening and filtering. The BLAST Search function allows users to input DNA or RNA sequences in Fasta format and utilizes the BLASTn (2.13.0+) algorithm to compare against the RNA sequences included in the database, thereby retrieving potential homologous sequences or functionally related molecules. Furthermore, clicking the “more” button for each data entry leads to a detailed information page(Fig. 4).
Fig. 3.
The search page and result page of CMsiRNAdb
Fig. 4.
Detail page of CMsiRNAdb
Data presentation
The detail page of the website is primarily organized into three key sections: The Patent Basic Information section contains the core metadata of the patent, including the Patent ID, Authorization Status, Accession Number, Patent Title, Target Gene and corresponding Gene ID, Cell Type, Dosage Concentration, Treatment Duration, Inhibition Rate, and its Standard Deviation. The Sequence Information section encompasses the original double-stranded sequences, the chemically modified double-stranded sequences, sequence length, detailed annotations of modification sites on the sense and antisense strands, and the precise localization and specific sequence position of the modified groups (phosphate group, ribose, or base). The Chemical Modification Mapping section provides a comprehensive list of all modification types involved in the current sequence, the abbreviated modifier in the original sequence, detailed annotations of the modifications, and group localization information(Fig. 4).
Browse, statistics, download, submit, and help pages
The Browse module allows for the systematic exploration of entries within the database, including Patent ID, Target Gene, Cell Type, chemically modified sequences of the sense and antisense strands, and their corresponding inhibition rates. Clicking the “more” button for each entry redirects to the detail page for comprehensive information. To facilitate the collection of additional entries from other research, a “Submit” interface is provided for researchers to contribute novel chemical modification information not yet included in the database. To ensure the authenticity and validity of user-submitted content, all submissions will undergo a manual curation process by our team. This review process includes verifying the authenticity of the data source (e.g., patent ID or publication DOI) and ensuring the completeness and accuracy of the provided information. Once validated, the new data will be integrated into the database and clearly labeled as ‘User-Submitted Data’ to distinguish it from the core patent-derived dataset. The “Download” page enables users to easily download complete and detailed files or chemical modification information specific to different target genes. The “Statistics” page offers detailed information, including overall database statistics, the distribution of modified strand entries across cell lines, the distribution of modified strand entries across target genes, the distribution of inhibition rates of modified strands, and the distribution of modification types of modified strands. The “Help” page provides users with comprehensive tutorials on how to operate, query, and browse the CMsiRNAdb database.
ModMapper tool
The development of the ModMapper module primarily addresses the issue of insufficient data standardization in current siRNA chemical modification research. The inconsistent modification nomenclature and recording formats employed by different research teams make direct comparison and integrated analysis of modification data challenging. To this end, this module provides a detailed user guide, allowing users to quickly obtain parsing results by inputting Fasta format sequences before and after chemical modification(Fig. 5). The system automatically extracts modification sites, descriptions of modification types, and localization information of the modified groups, assisting users in systematically understanding the distribution and characteristics of chemical modifications within the sequence [43, 44]. Furthermore, ModMapper supports one-click download of the parsed structured result files, facilitating subsequent analysis and record-keeping for users.
Fig. 5.
ModMapper page of CMsiRNAdb
Case study: a practical workflow for rational SiRNA optimization
To demonstrate a practical, research-driven workflow for siRNA optimization, we simulated a use case for improving upon an existing high-efficacy siRNA. As illustrated in Fig. 6, the process begins with an ‘Exact Search’ query for siRNAs targeting PCSK9. From the results, we selected a highly active candidate used in HeLa cells, which has a reported experimental inhibition rate of 79.00%. The base nucleotide sequence of this candidate was then copied into the Cm-siRPred prediction tool for in silico redesign. Using the modification interface, we applied a new rational modification pattern. After submitting this newly designed siRNA for evaluation, the Cm-siRPred tool generated a new efficacy score of 0.8061 (80.61%). This predicted score is notably higher than the original patent’s experimental activity, thus successfully demonstrating the complete “from query to efficiency prediction” workflow and highlighting how researchers can use the integrated platform to rationally test new modification patterns for enhanced efficacy.
Fig. 6.
A practical workflow for in silico siRNA optimization using CMsiRNAdb
Discussion
CMsiRNAdb is a dedicated database focusing on the silencing efficiency of chemically modified siRNAs. By systematically integrating experimental data from multi-national patents, it establishes a multi-dimensional structured knowledgebase encompassing sequence information, modified groups and sites, physicochemical parameters, and silencing efficiency. The innovation of this database is primarily reflected in three aspects: Firstly, it integrates chemically modified siRNA data from diverse sources using a unified data standard, addressing the long-standing issue of data fragmentation in this field. Secondly, it develops a multi-level query system, enabling users to perform precise filtering based on key parameters such as target gene, cell type, and modification pattern. More importantly, the database innovatively integrates the ModMapper chemical modification parsing tool and the Cm-siRPred prediction algorithm, providing crucial support for siRNA drug design. These unique features not only offer researchers comprehensive reference data on chemically modified siRNAs but, more importantly, provide essential benchmark data for computer-aided siRNA design research, significantly enhancing the efficiency of siRNA drug design. This holds significant value for promoting the rational design and development of nucleic acid drugs and provides critical data support for understanding the structure-activity relationship between chemical modifications and siRNA activity.
A critical aspect of CMsiRNAdb is the inherent data bias identified during our statistical analysis (Fig. 2b and c). The dataset is characterized by a high concentration of entries from specific cell lines (e.g., Hep3B) and a limited set of target genes (e.g., PNPLA3, HSD17B13). This skewness is an important limitation of the database resource itself. While the integrated Cm-siRPred model was trained on a separate dataset (and thus was not biased by this data), this distribution has two key implications: First, when using CMsiRNAdb as a benchmark or validation set, the overall performance metrics (e.g., RMSE, R²) will be disproportionately influenced by these over-represented categories. Second, any new predictive models trained directly on this dataset in the future would likely inherit this bias, potentially limiting their generalization to under-represented cell lines or novel gene targets. Therefore, users should be aware of this data imbalance when either interpreting validation results or utilizing the database for new model development.
Currently, CMsiRNAdb still has some limitations. Firstly, due to the primary data source being patents, the completeness and reliability of the data have certain constraints. On one hand, patent data typically emphasizes intellectual property protection rather than complete disclosure of experimental details, leading to missing or inconsistent key parameters. On the other hand, the significant differences in experimental design and evaluation criteria across different patents somewhat affect the database’s integrated analysis capabilities. In the future, CMsiRNAdb will continuously expand its data scale, covering more novel chemical modification types, clinical candidate molecules, and indication information. To ensure the long-term usability and credibility of the resource, we plan to implement regular updates (e.g., annually or semi-annually) to incorporate new data from recently published patents and literature, along with a clear version management strategy. Simultaneously, we acknowledge that the integrated Cm-siRPred tool currently lacks a detailed benchmark performance evaluation on the new CMsiRNAdb dataset. A critical future direction is to perform this comprehensive validation, including calculating key metrics (e.g., AUC, RMSE) and systematically comparing its performance against existing methods. This large-scale validation will serve as the foundation for subsequent model iteration and optimization based on this new dataset, which we believe will significantly enhance the generalization ability and applicability of the prediction module. Furthermore, to enhance data complementarity and interoperability, we will implement more specific integration strategies. Future plans include the development of a RESTful API to allow for programmatic access and automated data retrieval. Concurrently, we will work to improve data complementarity by adopting standardized identifiers, such as cross-referencing chemical modifications with PubChem IDs and target genes with NCBI Gene IDs. This will strengthen the database’s interconnection with external biological resources, promoting more effective cross-platform data sharing and integrated analysis. This will contribute to the design and development of precise nucleic acid drugs, providing stronger technical support for basic research and clinical translation in the field of nucleic acid drugs.
Conclusions
CMsiRNAdb currently houses 43,153 entries, encompassing 15,760 siRNA sequences from 90 patents, covering 13 target genes, 39 cell lines, and 36 different types of chemical modifications, providing researchers with standardized data resources for chemically modified siRNAs. This database not only offers convenient and rich information query functionalities but also supports multi-dimensional data browsing and analysis, significantly improving the efficiency of chemically modified siRNA research. We believe that CMsiRNAdb will become an important tool for advancing siRNA drug development by integrating data, optimizing prediction algorithms, and standardizing analysis workflows. This will assist researchers in rapidly screening effective modification patterns, evaluating siRNA activity, and accelerating the design and development process of novel nucleic acid drugs. In the future, with the continuous expansion of data scale and the ongoing refinement of its functionalities, this database has the potential to become a crucial bridge connecting basic research and clinical translation.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
During the preparation of this manuscript, the authors utilized Gemini to enhance the language and readability. Following the use of this service, the authors thoroughly reviewed and revised the content as necessary, and take full responsibility for the final version of the publication.
Author contributions
S.C.H.: Conceptualized and developed the overall architecture of the database and website; participated in data processing.C.C.: Responsible for data collection, curation, and core analysis; generated figures for the study; wrote the original draft of the manuscript.X.R.P.: Designed and implemented the backend tables for the database; contributed to the website development.G.G.X. and Y.Y.: Assisted in the development and testing of certain website functionalities.J.F., Y.Z., H.Z., and K.J.D.: Reviewed and edited the manuscript; provided critical guidance and constructive advice for challenges encountered during the research.
Funding
This work was supported by the National Natural Science Foundation of China (62471071, 62202069,62302079), Chengdu Health Commission-Chengdu University of Traditional Chinese Medicine Joint Research Fund (WXLH202402041).
Data availability
The dataset generated and analyzed during the current study is freely available in the CMsiRNAdb repository, at https://cellknowledge.com.cn/CMsiRNAdb/. The raw data comprising siRNA sequences, chemical modification details, and efficacy metrics were curated from 90 publicly available patents. The complete list of these patent identifiers is accessible and downloadable from the Download page of the CMsiRNAdb repository. These patent documents were originally retrieved from public open-access repositories, including Google Patents (https://patents.google.com/) and the World Intellectual Property Organization (WIPO) database (https://patentscope.wipo.int/).
Declarations
Ethics approval and consent to participate
Not applicable. This study involves the creation of a database (CMsiRNAdb) based on data curated from publicly available patents. No new experiments involving human participants, human data, or human tissue were performed by the authors for this study, and therefore, ethics approval and consent to participate were not required.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Sicheng He and Cheng Chen contributed equally to this work.
Contributor Information
Hasan Zulfiqar, Email: hasanzulfiqar@ism.cams.cn.
Yang Zhang, Email: zhy1001@alu.uestc.edu.cn, Email: yangzhang@cdutcm.edu.cn.
Kejun Deng, Email: dengkj@uestc.edu.cn.
References
- 1.Alshaer W, Zureigat H, Al Karaki A, Al-Kadash A, Gharaibeh L, Hatmal MM, Aljabali AAA, Awidi A. SiRNA: mechanism of action, challenges, and therapeutic approaches. Eur J Pharmacol. 2021;905:174178. [DOI] [PubMed] [Google Scholar]
- 2.Scotti L, Scotti MT. Compounds multi-targets against neglected diseases. Curr Drug Targets. 2024;25(9):575–6. [DOI] [PubMed] [Google Scholar]
- 3.Saw PE, Song EW. SiRNA therapeutics: a clinical reality. Sci China Life Sci. 2020;63(4):485–500. [DOI] [PubMed] [Google Scholar]
- 4.Friedrich M, Aigner A. Therapeutic siRNA: state-of-the-art and future perspectives. BioDrugs. 2022;36(5):549–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Paunovska K, Loughrey D, Dahlman JE. Drug delivery systems for RNA therapeutics. Nat Rev Genet. 2022;23(5):265–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang MM, Bahal R, Rasmussen TP, Manautou JE, Zhong XB. The growth of siRNA-based therapeutics: updated clinical studies. Biochem Pharmacol. 2021;189:114432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yonezawa S, Koide H, Asai T. Recent advances in SiRNA delivery mediated by lipid-based nanoparticles. Adv Drug Deliv Rev. 2020;154–155:64–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sahu C, Sahu RK, Roy A. A review on nanotechnologically derived phytomedicines for the treatment of hepatocellular carcinoma: recent advances in molecular mechanism and drug targeting. Curr Drug Targets. 2025;26(3):167–87. [DOI] [PubMed] [Google Scholar]
- 9.Iwamoto N, Butler DCD, Svrzikapa N, Mohapatra S, Zlatev I, Sah DWY, Meena, Standley SM, Lu G, Apponi LH, et al. Control of phosphorothioate stereochemistry substantially increases the efficacy of antisense oligonucleotides. Nat Biotechnol. 2017;35(9):845–51. [DOI] [PubMed] [Google Scholar]
- 10.Harp JM, Guenther DC, Bisbe A, Perkins L, Matsuda S, Bommineni GR, Zlatev I, Foster DJ, Taneja N, Charisse K, et al. Structural basis for the synergy of 4’- and 2’-modifications on SiRNA nuclease resistance, thermal stability and RNAi activity. Nucleic Acids Res. 2018;46(16):8090–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Malek-Adamian E, Guenther DC, Matsuda S, Martínez-Montero S, Zlatev I, Harp J, Burai Patrascu M, Foster DJ, Fakhoury J, Perkins L, et al. 4’-C-methoxy-2’-deoxy-2’-fluoro modified ribonucleotides improve metabolic stability and elicit efficient RNAi-mediated gene silencing. J Am Chem Soc. 2017;139(41):14542–55. [DOI] [PubMed] [Google Scholar]
- 12.Soutschek J, Akinc A, Bramlage B, Charisse K, Constien R, Donoghue M, Elbashir S, Geick A, Hadwiger P, Harborth J, et al. Therapeutic silencing of an endogenous gene by systemic administration of modified SiRNAs. Nature. 2004;432(7014):173–8. [DOI] [PubMed] [Google Scholar]
- 13.Coelho T, Adams D, Silva A, Lozeron P, Hawkins PN, Mant T, Perez J, Chiesa J, Warrington S, Tranter E, et al. Safety and efficacy of RNAi therapy for transthyretin amyloidosis. N Engl J Med. 2013;369(9):819–29. [DOI] [PubMed] [Google Scholar]
- 14.Khvorova A. Oligonucleotide therapeutics - a new class of cholesterol-lowering drugs. N Engl J Med. 2017;376(1):4–7. [DOI] [PubMed] [Google Scholar]
- 15.Zhang HQ, Arif M, Thafar MA, Albaradei S, Cai P, Zhang Y, Tang H, Lin H. PMPred-AE: a computational model for the detection and interpretation of pathological myopia based on artificial intelligence. Front Med. 2025;12:1529335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, Shi Z. Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings. Front Med. 2023;10:1291352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pham NT, Phan LT, Seo J, Kim Y, Song M, Lee S, Jeon YJ, Manavalan B. Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach. Brief Bioinform. 2023;25(1):bbad433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pham NT, Rakkiyapan R, Park J, Malik A, Manavalan B. H2Opred: a robust and efficient hybrid deep learning model for predicting 2’-O-methylation sites in human RNA. Brief Bioinform. 2023;25(1):bbad476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liu Y, Li H, Zeng T, Wang Y, Zhang H, Wan Y, Shi Z, Cao R, Tang H. Integrated bulk and single-cell transcriptomes reveal pyroptotic signature in prognosis and therapeutic options of hepatocellular carcinoma by combining deep learning. Brief Bioinform. 2023;25(1):bbad487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mohapatra M, Sahu C, Mohapatra S. Trends of artificial intelligence (AI) use in drug targets, discovery and development: current status and future perspectives. Curr Drug Targets. 2025;26(4):221–42. [DOI] [PubMed] [Google Scholar]
- 21.Dar SA, Gupta AK, Thakur A, Kumar M. SMEpred workbench: a web server for predicting efficacy of chemically modified SiRNAs. RNA Biol. 2016;13(11):1144–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dong X, Zheng W. Cheminformatics modeling of gene silencing for both natural and chemically modified siRNAs. Molecules. 2022;27(19):6412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Liu T, Huang J, Luo D, Ren L, Ning L, Huang J, Lin H, Zhang Y. Cm-siRPred: predicting chemically modified SiRNA efficiency based on multi-view learning strategy. Int J Biol Macromol. 2024;264(Pt 2):130638. [DOI] [PubMed] [Google Scholar]
- 24.Basith S, Pham NT, Manavalan B, Lee G. SEP-AlgPro: an efficient allergen prediction tool utilizing traditional machine learning and deep learning techniques with protein language model features. Int J Biol Macromol. 2024;273(Pt 2):133085. [DOI] [PubMed] [Google Scholar]
- 25.Malik A, Kamli MR, Sabir JSM, Rather IA, Phan LT, Kim CB, Manavalan B. APLpred: a machine learning-based tool for accurate prediction and characterization of asparagine peptide lyases using sequence-derived optimal features. Methods. 2024;229:133–46. [DOI] [PubMed] [Google Scholar]
- 26.Liu T, Qiao H, Wang Z, Yang X, Pan X, Yang Y, Ye X, Sakurai T, Lin H, Zhang Y. CodLncScape provides a self-enriching framework for the systematic collection and exploration of coding LncRNAs. Adv Sci. 2024;11(22):e2400009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ahmed Z, Shahzadi K, Jin Y, Li R, Momanyi BM, Zulfiqar H, Ning L, Lin H. Identification of RNA-dependent liquid-liquid phase separation proteins using an artificial intelligence strategy. Proteomics. 2024;24(21–22):e2400044. [DOI] [PubMed] [Google Scholar]
- 28.Dar SA, Thakur A, Qureshi A, Kumar M. SiRNAmod: a database of experimentally validated chemically modified SiRNAs. Sci Rep. 2016;6:20031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhang Y, Yang T, Yang Y, Xu D, Hu Y, Zhang S, Luo N, Ning L, Ren L. SiRNAEfficacyDB: an experimentally supported small interfering RNA efficacy database. IET Syst Biol. 2024;18(6):199–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zheng L, Liang P, Long C, Li H, Li H, Liang Y, He X, Xi Q, Xing Y, Zuo Y. EmAtlas: a comprehensive atlas for exploring spatiotemporal activation in mammalian embryogenesis. Nucleic Acids Res. 2023;51(D1):D924–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Nakabayashi H, Taketa K, Miyano K, Yamane T, Sato J. Growth of human hepatoma cells lines with differentiated functions in chemically defined medium. Cancer Res. 1982;42(9):3858–63. [PubMed] [Google Scholar]
- 32.Sells MA, Chen ML, Acs G. Production of hepatitis B virus particles in hep G2 cells transfected with cloned hepatitis B virus DNA. Proc Natl Acad Sci USA. 1987;84(4):1005–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lewis LC, Chen L, Hameed LS, Kitchen RR, Maroteau C, Nagarajan SR, Norlin J, Daly CE, Szczerbinska I, Hjuler ST, et al. Hepatocyte mARC1 promotes fatty liver disease. JHEP Rep Innov Hepatol. 2023;5(5):100693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sultana N, Izawa T, Kamei T, Fujiwara S, Ito Y, Takami Y, Kuwamura M. Application of humanized mice to toxicology studies: properties of chimeric mice with humanized liver (PXB-mice) for hepatotoxicity. J Toxicol Pathol. 2025;38(2):183–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Duan SY, Ge XM, Lu N, Wu F, Yuan W, Jin T. Synthetic polyspermine imidazole-4, 5-amide as an efficient and cytotoxicity-free gene delivery system. Int J Nanomed. 2012;7:3813–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Abul-Husn NS, Cheng X, Li AH, Xin Y, Schurmann C, Stevis P, Liu Y, Kozlitina J, Stender S, Wood GC, et al. A protein-truncating HSD17B13 variant and protection from chronic liver disease. N Engl J Med. 2018;378(12):1096–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lancellotti P, Oury C. A highly durable RNAi therapeutic inhibitor of PCSK9. N Engl J Med. 2017;376(18):e38. [DOI] [PubMed] [Google Scholar]
- 38.Makhmudova U, Steinhagen-Thiessen E, Volpe M, Landmesser U. Advances in nucleic acid-targeted therapies for cardiovascular disease prevention. Cardiovasc Res. 2024;120(10):1107–25. [DOI] [PubMed] [Google Scholar]
- 39.Yuan LF, Sheng J, Lu P, Wang YQ, Jin T, Du Q. Nanoparticle-mediated RNA interference of angiotensinogen decreases blood pressure and improves myocardial remodeling in spontaneously hypertensive rats. Mol Med Rep. 2015;12(3):4657–63. [DOI] [PubMed] [Google Scholar]
- 40.Sarkar S, Gebert LFR, MacRae IJ. Structural basis for gene silencing by siRNAs in humans. bioRxiv: the preprint server for biology; 2024.
- 41.Schirle NT, MacRae IJ. The crystal structure of human Argonaute2. Science. 2012;336(6084):1037–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Behlke MA. Chemical modification of SiRNAs for in vivo use. Oligonucleotides. 2008;18(4):305–19. [DOI] [PubMed] [Google Scholar]
- 43.Zheng L, Liu D, Li YA, Yang S, Liang Y, Xing Y, Zuo Y. RaacFold: a webserver for 3D visualization and analysis of protein structure by using reduced amino acid alphabets. Nucleic Acids Res. 2022;50(W1):W633–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Xu B, Liu D, Wang Z, Tian R, Zuo Y. Multi-substrate selectivity based on key loops and non-homologous domains: new insight into ALKBH family. Cell Mol Life Sci. 2021;78(1):129–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The dataset generated and analyzed during the current study is freely available in the CMsiRNAdb repository, at https://cellknowledge.com.cn/CMsiRNAdb/. The raw data comprising siRNA sequences, chemical modification details, and efficacy metrics were curated from 90 publicly available patents. The complete list of these patent identifiers is accessible and downloadable from the Download page of the CMsiRNAdb repository. These patent documents were originally retrieved from public open-access repositories, including Google Patents (https://patents.google.com/) and the World Intellectual Property Organization (WIPO) database (https://patentscope.wipo.int/).






