Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Jul 16;26(4):bbaf329. doi: 10.1093/bib/bbaf329

Advancing genome-based precision medicine: a review on machine learning applications for rare genetic disorders

Syed Raza Abbas 1,2, Zeeshan Abbas 2,3,2, Arifa Zahir 4, Seung Won Lee 5,6,7,8,9,
PMCID: PMC12265892  PMID: 40668553

Abstract

Precision medicine tailors medical procedures to individual genetic overviews and offers transformative solutions for rare genetic conditions. Machine learning (ML) has enhanced genome-based precision medicine (GBPM) by enabling accurate diagnoses, customized treatments, and risk assessments. ML tools, including deep learning and ensemble methods, process high-dimensional genomic data and reveal discoveries in rare diseases. This review analyzes the ML applications in GBPM, emphasizing its role in disease classification, therapeutic optimization, and biomarker discovery. Key challenges, such as computational complexity, data scarcity, and ethical concerns, are discussed alongside advancements such as hybrid ML models and real-time genomic analysis. Security issues, including data breaches and ethical challenges, are addressed. This review identifies future directions, emphasizing the need for comprehensible ML models, increasing data-sharing frameworks, and global collaborations. By integrating the current research, this study provides a comprehensive perspective on the use of ML for rare genetic disorders, paving the way for transformative advancements in precision medicine.

Keywords: artificial intelligence, internet of things, machine learning, deep learning, healthcare, GBPM, explainable AI

Introduction

Background and motivation

Rare genetic disorders affect millions of people worldwide, presenting challenges in diagnosis and treatment. Fortunately, advances in genomic technology, particularly next-generation sequencing (NGS), have improved our ability to identify genetic causes, enhance diagnostic accuracy, and deepen our understanding of these conditions [1]. Machine learning (ML) and deep learning (DL) in particular has revolutionized genomic medicine by directly analyzing complex data to uncover patterns and relationships often missed by traditional approaches [2, 3].

In the clinical setting, ML enhances the diagnosis of rare genetic disorders by integrating genomic data with clinical phenotypes. ML models streamline diagnostics by prioritizing genes and predicting pathogenic variants, thereby reducing the time to diagnosis. Studies have highlighted the role of ML in improving the accuracy of variant interpretation in NGS-based diagnostics [4]. The benefits of ML extend beyond disease diagnosis to aiding in the creation of customized treatment options. By processing genomic and clinical data, ML models can identify potential treatment targets and predict patient responses to specific therapies [5].

Along with these advancements, the application of ML in genome-based precision medicine (GBPM) for rare genetic disorders faces several challenges. The scarcity of data is a major barrier because the rare nature of these conditions means that there are few high-quality annotated datasets available for training effective ML models. Moreover, the complexity of genomic data requires advanced computational methods to efficiently handle and analyze extensive information [6].

Recent advances have begun to address some of these challenges. The development of hybrid ML models that combine multiple data types, such as genomic, transcriptomic, and clinical data, has shown promise for improving predictive performance. Furthermore, real-time genomic analysis facilitated by ML algorithms is becoming increasingly feasible, enabling more timely and accurate clinical decision-making [7]. Table 1 presents the integration of ML and genomic technologies for investigating rare genetic disorders.

Table 1.

Integration of ML and genomic technologies in rare genetic disorder research

Category Details Impact Examples Challenges Addressed Ref
Genomic technologies (NGS) Revolutionized variant identification, improving diagnostic accuracy. Accelerates rare disorder diagnosis and understanding of genetic mechanisms. Genome-wide association studies (GWAS). Overcoming diagnostic delays. [1]
ML in genomic medicine Enhances diagnostic processes for rare genetic disorders; Identifies therapeutic targets. Drives precision medicine by integrating and analyzing complex datasets. Variant prioritization tools, gene prediction. Data integration, improving diagnostic accuracy. [8, 9]
Recent advancements Hybrid ML models integrating multiple data types; Real-time genomic analysis. Improves predictive accuracy, enables timely clinical decision-making, and broadens applications. Multi-omics data integration, real-time prediction tools. Enhances model performance and decision support in clinical settings. [10]
ML in personalized medicine Facilitates design of individualized therapeutic interventions. Aligns with precision medicine to tailor medical care to patients’ unique characteristics. Drug response prediction, target identification. Addresses variability in patient responses to treatment. [11]

Overview of ML in GBPM

The integration of ML into GBPM has significantly advanced the diagnosis and treatment of rare genetic disorders. ML algorithms, particularly DL models, excel at analyzing complex genomic data, identifying patterns that facilitate accurate disease classification, and discovering new biomarkers [12, 13]. In clinical diagnostics, ML improves the interpretation of genomic variants by predicting their pathogenicity, thereby facilitating g the identification of generative mutations in patients with rare diseases. For example, AI-MARRVEL, an ML system developed to prioritize potentially causal variants for Mendelian disorders, has shown improved diagnostic efficiency [14, 15].

Security concerns, including the risk of data breaches and ethical crises in genomic data sharing, are also major problems. Therefore, it is important to use strong encryption methods and develop thorough policy frameworks to minimize such risks [16]. The development of explainable ML models, enhanced data-sharing frameworks, and global collaboration is vital for advancing GBPM. These efforts aim to overcome existing barriers and fully realize the capability of ML to improve the diagnosis and treatment of rare genetic disorders [17]. Figure 1 presents the importance scores of the key aspects of ML in GBPM. The updated references from 2023 align with the key aspects of ML in GBPM [18–21].

Figure 1.

This figure relative showing importance of six thematic areas in ML applications to GBPM. These importance scores were derived through qualitative literature analysis from 2023 and 2024, taking into account both thematic frequency and citation weight. The scores were then normalized on a 0–100 scale to enable comparative visualization across categories.

Relative importance of key thematic areas in ML applications to GBPM (2023–2024).

ML in medicine

ML has become a transformative force in medicine, enhancing diagnostic accuracy, personalizing treatments, and streamlining healthcare operations.

Diagnostic accuracy and disease prediction

ML algorithms have transformed disease diagnosis by enabling early and accurate detection. In 2023 and 2024, ML models have demonstrated an exceptional ability to identify complex patterns in medical imaging data, particularly for diseases such as diabetes, cancer, Alzheimer’s disease, and cardiovascular disorders [22–24].

Personalized treatment plans

Procedures in customized medicine are altered according to the unique genetic, environmental, and lifestyle characteristics of patients. ML has advanced this approach by analyzing diverse datasets to predict specific patient responses to therapies [25]. In 2024, ML-enabled systems were increasingly used to design individualized care pathways. These tools consider possibilities, drug interactions, and patient preferences to provide clinicians with actionable awareness.

Drug discovery and development

ML enhances the modification of current drugs by identifying new applications based on shared molecular processes. These inventions underscore the potential of ML to address the lack of medical needs while optimizing resource allocation in the drug industry. Additionally, recent studies have employed ML models to predict off-target effects and therapeutic efficacy, leading to the discovery of new uses for approved drugs [26].

The integration of ML with drug discovery has enhanced the identification and development of new therapies. Algorithms can rapidly screen billions of molecular structures to identify potential drug candidates, thereby significantly reducing the time and costs associated with traditional drug development methods [27].

Operational efficiency in healthcare

ML enhances clinical workflows. Decision aid systems use real-time data to prioritize urgent cases, ensuring timely care.ML enhances clinical workflows. Decision-aid systems use real-time data to prioritize urgent cases and ensure timely care. For instance, AI-driven triage tools can analyze symptoms and vital signs to predict patient outcomes, enabling healthcare providers to allocate resources more effectively. These applications demonstrate how ML contributes to a more efficient and patient centric healthcare system [28, 29].

ML has improved several administrative tasks in healthcare, improving operational efficiency. Automated scheduling systems powered by ML optimize patient appointments, minimize waiting times, and enhance resource utilization [30].

Ethical considerations

The adoption of ML in medicine faces significant challenges. Data privacy remains a critical issue because ML models require access to vast amounts of sensitive information. Ensuring that patient data are secure and ethically used is important, especially with the increasing number of reports of data breaches. Furthermore, algorithmic bias can lead to disparities in healthcare outcomes, underscoring the need for fair and transparent model development [31–33]. Table 2 summarizes recent advancements and challenges of ML in medicine. It details specific areas, such as diagnostic accuracy, personalized treatment plans, and drug discovery, highlighting developments from 2023 to 2024 and providing references for each.

Table 2.

Recent advancements and challenges of ML in medicine

Area Description Recent developments (2023–2024) Ref
Diagnostic accuracy and disease prediction ML models are transforming disease diagnosis by recognizing complex patterns in clinical and imaging data. These models have proven particularly effective in diagnosing diseases such as cancer, diabetes, and cardiovascular disorders. DL models for predicting RNA-Seq expression in tumors for cancer diagnostics. [34, 35]
Personalized treatment plans ML has enhanced personalized medicine by predicting patient-specific responses to therapies. This approach tailors interventions to genetic, environmental, and lifestyle factors, improving treatment outcomes. Genomic data-driven treatment predictions in oncology. Systems that design individualized care pathways, including drug interactions and patient preferences. [36, 37]
Drug discovery and development ML is accelerating drug discovery by screening molecular structures and optimizing drug development, reducing time and costs. It is also advancing drug repurposing by predicting new uses for existing medications. AI-driven drug design for diseases like Parkinson’s and autoimmune disorders. ML for predicting off-target effects and therapeutic efficacy. [38, 39]
Operational efficiency in healthcare ML improves healthcare operations by automating administrative tasks and streamlining clinical workflows. It enhances scheduling, resource allocation, and care delivery. Automated patient scheduling to reduce wait times. Real-time decision-support systems prioritizing urgent cases. [40, 41]
Challenges and ethical considerations Data privacy, algorithmic bias, and ethical concerns such as informed consent and accountability are challenges in adopting ML in healthcare. Ongoing research to mitigate data privacy risks. Efforts to address algorithmic bias and ensure model transparency. [42, 43]

Opportunities

ML algorithms have the potential to improve the diagnosis of rare genetic disorders by identifying subtle patterns in genomic data that might be overlooked by traditional methods. For instance, AI-MARRVEL was developed to prioritize variants that could cause Mendelian disorders, thus streamlining the diagnostic process [9].

The combination of AI and CRISPR technologies has the potential to revolutionize drug discovery by enabling the rapid identification of therapeutic targets and the development of new treatments for genetic disorders. This collaboration could lead to more effective and efficient therapeutic interventions [44]. Figure 2 illustrates CRISPR, which is the most versatile, affordable, and user-friendly genome-editing technology available today. ML and DL algorithms are integrated to enhance its efficiency and precision [45, 46].

Figure 2.

It presents a theoretical workflow integrating CRISPR gene editing with ML models for precision genome correction. The model is trained on CRISPR data to identify and guide the repair of defective gene sequences. This enables the targeted modification, removal, insertion, or alteration of DNA segments to restore proper gene function.

A theoretical representation of CRISPR gene editing utilizing ML computational model.

Scope and purpose of this review

The scope and purpose of this review article is to consolidate and advance our understanding of the intersection of ML, genomics, and precision medicine, particularly in addressing rare genetic disorders. Rare genetic disorders represent a significant challenge in healthcare, and are characterized by the complexity of diagnosis, limited treatment options, and small number of affected individuals. This review aims to provide a comprehensive perspective on how ML technologies can enhance GBPM and improve outcomes under these conditions. Table 3 reveals the distinctiveness of this review by comparing the details of its scope, advanced methodologies, comprehensively addressed challenges, identified knowledge gaps, and unique contributions with other articles in the field.

Table 3.

Comparison of this review with existing review articles on ML in GBPM for rare genetic disorders

Ref Scope Methodological advances Challenges/limitations Knowledge gaps Unique contributions
[47] General ML in healthcare Standard ML techniques Data scarcity and complexity Overlook ethical issues General applications in healthcare without focus on rare disorders
[48] Genomics broadly Basic genomics applications Computational demands, general ethical concerns Lack of real-world applications Broad discussion, no specific focus on rare disorders
[49] Precision medicine Some new methodologies like AI Ethical concerns, data integration challenges Standardization issues Advances in precision medicine without specific ML focus, Specially work on cancer
[50] Broad ML applications in various diseases Traditional diagnostic tools General scalability issues, lack of interpretability Overlooks rare genetic disorders Diverse but not specific to rare genetic disorders
[51] ML and data science in medicine Basic ML applications Data heterogeneity, lacks depth in clinical integration Misses interdisciplinary insights Focuses on data science without integrating clinical insights
This review ML in GBPM specifically for rare genetic disorders Advanced ML methodologies like XAI, real-time genomic tools Addressing data scarcity, computational complexity, ethical concerns comprehensively Identifies critical gaps in rare disorders Interdisciplinary insights, advanced methodologies, specific focus on clinical implementation and policy implications

Scope of the review

This review covering multiple disciplines, including clinical medicine, computational biology, genomics. It explores how advancements in ML have been adapted to solve problems unique to rare genetic disorders such as data scarcity and genomic complexity.

The review investigates specific applications of ML, including:

  • Enhanced diagnostic processes for identifying pathogenic genetic variants;

  • The development of personalized treatment regimens based on individual genomic profiles;

  • Accelerated drug discovery and repurposing using ML to identify potential therapeutic targets.

A significant part of the scope includes an evaluation of the barriers to implementing ML in GBPM. These barriers include data heterogeneity, computational demands, ethical issues, and the need for interpretability in clinical settings. This review highlights the recent advancements in ML methodologies, such as explainable AI (XAI), hybrid ML models integrating multi-omics data, and real-time genomic analysis tools that enhance clinical decision-making.

Purpose of the review

By synthesizing research findings from 2020 to 2025, this review aims to provide an updated and cohesive understanding of the current state of ML in GBPM. It seeks to bridge gaps in the literature by addressing underexplored areas such as real-world implementation challenges and ethical considerations.

This review seeks to identify the knowledge and practice gaps that hinder the full potential of ML in this field. This review serves as a roadmap for future research by proposing the following actionable recommendations:

  • Develop interpretable and clinically applicable ML models;

  • Enhance data-sharing frameworks to facilitate global collaborations; and

  • Explore novel algorithms to handle high-dimensional genomic data efficiently.

This review underscores the importance of aligning advancements in ML with ethical guidelines and regulatory frameworks to ensure equitable and secure use of genomic data. By addressing these aspects, this review aims to inform healthcare policymakers and regulatory bodies.

Contributions to the literature

This section explains how this study extends the existing knowledge, addresses gaps, and provides novel insights into the application of ML in GBPM for rare genetic disorders.

One of the primary contributions of this study is its ability to bridge gaps in the existing literature. Despite significant advances in ML and genomics, the application of these technologies to rare genetic disorders remains limited. This study merges scattered findings and presents a cohesive viewpoint, offering a comprehensive overview that was previously unavailable. A review of diverse case studies provides insights into specific applications, such as biomarker discovery, disease classification, and personalized treatments. This study reveals the importance of interdisciplinary methods by integrating insights from each field, which demonstrates how ML can act as a bridge between computational data analysis and clinical applications, encouraging collaborative efforts among researchers from diverse backgrounds.

Another main contribution of this study is the identification and evaluation of advanced ML techniques, including DL, hybrid models, and ensemble approaches, for rare genetic disorders. This paper not only discusses their theoretical applications but also provides practical insights into their implementation challenges, clinical relevance, and performance metrics. This paper systematically discusses the challenges associated with ML in GBPM, including data scarcity, computational complexity, and ethical problems. In contrast to many studies focusing solely on technological aspects, this review integrates discussions on regulatory and ethical implications, offering a holistic view of the field. Moreover, it proposes actionable recommendations such as enhanced data-sharing frameworks, XAI models, and standardized protocols for genomic data management. In addition to its academic contributions, this study has practical importance for healthcare professionals, technology developers, and policymakers. The demonstration of how ML can improve diagnostic accuracy, accelerate drug discovery, and personalize treatment illustrates the transformative potential of these technologies in clinical settings. This practical focus ensures that the research is not only theoretically robust, but also actionable.

Methodology

This section defines the structured and strict approach employed in this review to analyze the intersection of ML and GBPM for rare genetic disorders. An extensive workflow for the systematic review, which follows PRISMA guidelines, is shown in Fig. 3, which includes every stage from database identification to the ultimate decision to include specific research studies.

Figure 3.

Outlines the systematic review methodology, beginning with research question formulation and branching into parallel processes of search strategy and study selection. These steps involve database identification, keyword generation, and inclusion/exclusion criteria application. The extracted data are qualitatively analyzed and undergo quality assessment using standardized checklists to ensure rigor and reliability.

Workflow diagram for the systematic review methodology.

Research questions

This review addresses the critical aspects of the application of ML in GBPM by formulating the following research questions:

  • What are the current applications of ML in GBPM for rare genetic disorders?

  • What are the challenges and opportunities related to integrating ML in this domain?

  • What tools, advancements, and methodologies have emerged in recent years and what are the future research directions?

These questions were designed to provide a comprehensive understanding of the field and to guide the selection and analysis of relevant studies.

Search strategy

A multistage search strategy was adopted to ensure the inclusion of relevant high-quality literature.

  • Database Selection: The review utilized academic databases, including PubMed, Scopus, IEEE Xplore, MDPI, and Google Scholar. These platforms were selected for their extensive inclusion of biomedical, computational, and engineering studies.

  • Keyword Formulation: Combinations of the following list of terms was used to maximize the retrieval of the relevant studies.

    • “ML in Precision Medicine”

    • “Genomics and ML”

    • “Rare Genetic Disorders and AI”

    • “Genome-Based Diagnostics”

  • Timeframe: To capture recent advancements while acknowledging foundational studies, the search was restricted to articles published between 2020 and 2025.

  • Additional Sources: To ensure comprehensiveness, manual searches were conducted on the references of the selected articles and key journals.

Figure 4 presents the approximate number of peer-reviewed papers (journal articles, conference papers) published annually between 2015 and 2024 on ML applications in genome-driven precision medicine for rare genetic disorders. This figure was compiled from bibliometric analyses of PubMed-indexed literature and cross-disciplinary databases (e.g. Web of Science, IEEE Xplore, Google Scholar).

Figure 4.

The annual publication trends from 2015 to 2024 for ML applications in GBPM related to rare genetic disorders. The graph shows a slow increase in early years, followed by exponential growth from 2020 onward. This sharp rise reflects the growing research interest and advancements in AI based genomics and precision medicine.

Annual publication trends (2015–2024) in ML applications for GBPM in rare genetic disorders.

Study selection

The selection process involved two phases.

Phase 1—Title and Abstract Screening: Articles were screened for relevance based on their titles and abstracts. Studies addressing ML in GBPM and rare genetic disorders were added to the list of candidates.

Phase 2—Full Text Review: Candidate articles were reviewed in full to assess their alignment with the research objectives and inclusion criteria. The inclusion and exclusion criteria ensured that only relevant high-quality studies were included.

Table 4 shows the summary of the studies retrieved, screened, excluded, and finally included.

Table 4.

Summary of study selection process following PRISMA guidelines

Stage Number of articles
Records identified through database searching 865
 – PubMed 220
 – Scopus 185
 – IEEE Xplore 145
 – Google Scholar 315
Additional records identified through manual search 42
Total records before duplicates removed 907
Duplicates 132
Records after duplicates removal 775
Records excluded after title/abstract screening 534
Full-text articles assessed for eligibility 241
Full-text articles after applying exclusion criteria 172
Studies included in final systematic review 69

Inclusion Criteria

  • The study focused on ML applications in genomics and rare genetic disorders.

  • The study was a peer-reviewed journal article or conference proceeding.

  • The publication had sufficient methodological details and reproducible results

Exclusion Criteria

  • The study was a non-peer-reviewed articles, opinion pieces, or grey literature

  • The study had incomplete or ambiguous methodologies.

  • The article was unrelated to ML, genomics, or rare genetic disorders.

Data extraction

A standardized data extraction protocol was developed to ensure consistency and comprehensiveness. The key details extracted from each study are as follows:

  • Research objectives and hypotheses.

  • Types and sources of genomic data utilized in the study.

  • ML methodologies and models (e.g. DL, ensemble methods).

  • Evaluation metrics used to assess model performance.

  • Key findings, limitations, and future directions proposed by the authors.

Quality assessment

The quality of each study was rigorously evaluated using a modified Critical Appraisal Skills Programme checklist. The key assessment criteria were as follows:

  • Clarity and specificity of the research objectives

  • Robustness and reproducibility of the methodologies

  • Appropriateness of the evaluation metrics and datasets

  • Consideration of the ethical and regulatory issues in ML applications

Only studies meeting high-quality standards were included in the final synthesis.

Overview of ML in GBPM

ML has emerged as a revolutionary technology in GBPM, offering novel solutions for the treatment, diagnosis, and understanding of rare genetic disorders. This section provides an extensive overview of the key applications, challenges, advancements, case studies, and future opportunities of ML in this transformative field.

Key applications of ML in GBPM

ML has enabled significant advancements in several critical areas of GBPM, and algorithms play a pivotal role in analyzing high-dimensional genomic data to identify pathogenic variants linked to rare genetic disorders. CNNs and RNNs have been used to detect structural variations, single nucleotide polymorphisms, and copy number variations [52].

Biomarker discovery is crucial for understanding disease progression and designing targeted therapies [53].

The application of ML to drug discovery has significantly reduced the time and cost associated with the development of therapies for rare diseases [54].Moreover, ML-driven drug repurposing frameworks have successfully suggested new therapeutic uses for existing drugs such as antiepileptic medications for treating rare mitochondrial disorders [55].

Advancements in ML for genomics

Advancements in computational productivity have enabled real-time analysis of genomic data. High-performance ML algorithms are used in clinical settings to rapidly identify actionable genetic mutations and ensure timely intervention for critical conditions, such as neonatal genetic disorders [56]. Table 5 provides an overview of the diverse applications of ML in GBPM, detailing the techniques used and specific examples of their implementation.

Table 5.

Applications of ML in GBPM

Application Techniques Examples References
Disease diagnostics CNNs, RNNs Duchenne Muscular Dystrophy, Fragile X Syndrome [57, 58]
Biomarker discovery Random Forest, Feature Selection Rare cancers, metabolic disorders [59, 60]
Therapeutic target identification GNNs, GANs ALS, spinocerebellar ataxia [61, 62]
Drug repurposing and development Autoencoders Anti-epileptic drugs for mitochondrial disorders [63, 64]
Personalized treatment optimization Reinforcement learning Cystic fibrosis [65, 66]

Case studies in ML applications

Several case studies illustrate the transformative impact of ML in GBPM:

  1. ML in Rare Disease Diagnosis: A CNN-based approach was successfully employed to identify pathogenic variants of Rett Syndrome.

  2. Drug Repurposing for Rare Disorders: A DL model identified novel therapeutic uses of the anti-inflammatory drug celecoxib in treating rare mitochondrial disorders, demonstrating a cost-effective approach to drug discovery [67].

  3. Multi-Omics Integration in Rare Cancer Biomarker Discovery: A hybrid ML model integrating genomic, transcriptomic, and proteomic data identified biomarkers for early detection of rare sarcomas, enhancing survival rates through timely intervention [68, 69].

Opportunities for future research

Creating standardized and accessible genomic datasets across diverse populations will address data scarcity and improve model performance [70]. By integrating ML with cutting-edge technologies such as CRISPR and single-cell sequencing, researchers can pave the way for groundbreaking discoveries in therapeutic development [71]. Furthermore, the growing availability of patient-specific genomic data will enable highly personalized treatment approaches, further advancing precision medicine [72]. In this context, Table 6 highlights the key advancements in ML for genomics, detailing the innovative techniques employed and their transformative impact on genomic research.

Table 6.

Advancements in ML for genomics

Advancement Techniques Impact
Integration of multi-omics Data VAEs Predicting rare disease phenotypes
XAI in genomics XAI frameworks Interpretable insights into gene mutations
Real-time genomic analysis High-performance ML algorithms Rapid identification of genetic mutations

Security issues

This section discusses the complexities of genomic data privacy, cybersecurity threats, algorithmic security, and governance. In addition, it presents actionable strategies for mitigating these challenges.

Privacy concerns in genomic data

Genomic data are highly sensitive, as they uniquely identify individuals and reveal information about hereditary conditions, predispositions to diseases, and even ancestral origins. The impact of these data extends to family members and future generations, making breaches particularly harmful. Unauthorized access could lead to genetic discrimination, in which employers, insurers, or third parties exploit this information to deny opportunities or coverage [73]. ML algorithms can combine anonymized genomic data with external datasets (e.g. demographic data, clinical records) to reverse engineer identifiable information. This reduces the effectiveness of traditional anonymization approaches [74].

Cybersecurity threats

Genomic databases are prime targets for cyberattacks because of their immense value. A breach exposes not only personal data, but also valuable research insights that could be exploited for financial gain or industrial espionage. For example, stolen genomic data can be sold on the black market for purposes ranging from identity theft to biological weapon development [75]. Ransomware attacks are increasingly targeting healthcare and research institutions. These attacks involve encrypting the genomic data and demanding payment for decryption. Such incidents disrupt critical research and delay clinical applications, jeopardizing patient care and institutional reputations [76].

Table 7 summarizes the security issues, their impacts, mitigation strategies, and key studies in ML for genomic precision medicine.

Table 7.

Security issues in ML for genomic precision medicine

Aspect Challenge Impact Mitigation Strategy Ref
Genomic data sensitivity Unauthorized access & discrimination Harm to individuals and families Enhanced data anonymization and encryption methods [77]
Re-identification risks Advances in de-anonymization techniques Compromise of privacy protections Implementation of stricter access controls and continuous privacy assessments [78]
Cross-border data exchange Differing international data protection laws Legal and ethical compliance issues Development of international data sharing agreements [79]
Cybersecurity vulnerabilities High-value targets for cyberattacks Financial and data losses Application of state-of-the-art cybersecurity technologies [80]
Ethical and regulatory compliance Complexity of ethical considerations and laws (e.g. GDPR, HIPAA) Barriers in international research cooperation Advocacy for and implementation of global ethical standards [81]
Algorithmic biases Bias in data sets leads to skewed ML outcomes Inequitable healthcare outcomes Use of diverse datasets and bias mitigation algorithms [82]
Adversarial attacks Manipulation of ML models via data inputs Erroneous model outputs affecting healthcare decisions Deployment of adversarial robustness techniques and model validation processes [83]

Ethical and legal

Regulations, such as the General Data Protection Regulation (GDPR) in Europe and the Health Insurance Portability and Accountability Act (HIPAA) in the United States, impose strict requirements on data security and privacy. Although these frameworks aim to protect individuals, their regional nature poses challenges for global genomic collaboration. For example, genomic data collected in one jurisdiction may not meet the compliance standards of another, thus impeding cross border research [84].

Obtaining meaningful informed consent has become increasingly complex in genomic research, particularly when ML models are used. Participants may not fully understand how their data will be analyzed, shared, or used in secondary applications. Long-term storage of genomic data for future research adds another layer of complexity, as participant consent may not extend indefinitely [85]. Genomic datasets often underrepresent minority populations, leading to biases in ML models. This lack of diversity results in algorithms that perform poorly for underrepresented groups, thereby exacerbating healthcare disparities [86].

Algorithmic security

Adversarial attacks, in which input data are subtly modified to deceive ML models, represent a growing threat. For instance, adversarial perturbations in genomic sequences fed to an ML model can result in the misclassification of genetic variants. This is particularly concerning in clinical settings, where ML predictions inform critical medical decisions [87].

Figure 5 visually represents the focus on various genomic data security challenges, emphasizing that categories such as mitigation strategies and privacy concerns are key priorities in addressing security issues [88–90].

Figure 5.

It is presenting the proportional distribution of key challenges in genomic data security as discussed in recent literature. Mitigation strategies (30%) and privacy concerns (25%) receive the highest focus, highlighting critical areas of risk management. Other concerns include cybersecurity threats (20%), ethical challenges (15%), and algorithmic security (10%), indicating a multifaceted landscape of genomic data protection.

Proportional focus on genomic data security challenges.

Mitigation strategies

Cutting-edge encryption methods, such as homomorphic encryption and secure multiparty computation, allow encrypted data to be computed without revealing the raw data. These methods enable collaborative research while maintaining stringent data privacy [91].

FL is a decentralized procedure that trains ML models across multiple institutions without transferring raw genomic data [88]. Blockchain technology provides immutable audit trials for genomic data. Each transaction involving data access or modification is transparently recorded, ensuring accountability and preventing unauthorized changes [92].

Figure 6 shows the relative importance of various mitigation strategies in genomic ML security, emphasizing their roles in enhancing data protection and integrity. These strategies collectively address the pressing challenges in genomic data security, aligning with emerging global standards [88, 89, 93–95].

Figure 6.

It highlights the perceived importance of various mitigation strategies in enhancing genomic ML security. Real-time threat monitoring and federated learning are rated highest, reflecting strong emphasis on proactive and privacy-preserving techniques. Other key approaches include encryption, bias mitigation, and blockchain for data integrity, underscoring a multifaceted defense framework.

Importance of mitigation strategies in genomic ML security.

Advancements in GBPM

This section explores the key advancements that have transformed GBPM into the cornerstone of modern healthcare.

NGS technologies

The advent of NGS technology has been revolutionary for GBPM. NGS platforms now enable the rapid sequencing of entire genomes with high accuracy, significantly reducing costs and timeframes. For instance, third-generation sequencing methods such as single-molecule real-time (SMRT) sequencing and nanopore sequencing provide long-read data that resolve structural variations and complex genomic regions more effectively than the earlier short-read technologies [96].

NGS has paved the way for routine clinical applications such as the diagnosis of rare genetic disorders, identification of cancer mutations, and guidance of personalized therapies. Whole-genome sequencing (WGS) and whole-exome sequencing are increasingly integrated into diagnostic workflows, enabling the detection of pathogenic variants with unprecedented precision [97].

Figure 7 shows the varying impacts of different technological advancements in GBPM on crucial aspects such as accuracy, speed, cost effectiveness, and user friendliness.

Figure 7.

It evaluates the impact of various technologies on four critical aspects of precision medicine: accuracy, speed, cost effectiveness, and user-friendliness. Multiomics integration and explainable AI stand out for user friendliness and cost effectiveness, respectively, while real-time analysis excels in accuracy and speed. The heatmap provides a comparative score (1–10 scale) to highlight strengths and trade-offs across technology domains.

Impacts of technologies on different aspects in precision medicine.

Integration of multi-omics data

The integration of genomic, transcriptomic, proteomic, and metabolomic data has opened new avenues for understanding complex diseases. Multi- omics approaches provide a holistic view of biological systems, revealing interactions between genes, proteins, and metabolites that drive disease mechanisms [98].

ML algorithms play a pivotal role in integrating and analyzing multi- omics datasets. Advanced techniques such as GNNs and autoencoders are now used to identify key biomarkers and regulatory pathways [60].

ML in precision diagnostics

ML models have transformed the identification and clarification of pathogenic variants in genomic data. Tools such as DeepVariant, PolyPhen, and MutPred use DL algorithms to classify genetic mutations as benign or pathogenic with high precision. These advancements are particularly beneficial for diagnosing rare genetic conditions for which conventional methods may fall short [99].

ML-powered precision diagnostics have significantly improved the diagnostic yield for rare diseases. For example, ensemble learning methods that combine multiple algorithms outperform traditional single-method approaches in identifying disease-causing mutations [100].

Advancements in therapeutics and drug development

Recent advancements in genome editing methodologies such as CRISPR-Cas9 and base editing have revolutionized therapeutic development. ML models have been used to predict off-target effects, optimize guide RNA design, and identify potential therapeutic targets [101].

ML has accelerated drug discovery and repurposing by enabling the identification of novel therapeutic uses for existing drugs. Algorithms trained on genomic and chemical databases have been proposed as potential treatments for rare metabolic and neurological disorders. These advancements have reduced the time and costs associated with traditional drug development processes [102].

XAI in genomics

The rise of the XAI has addressed the black-box nature of traditional ML models. XAI methods provide interpretable insights into the link between specific genetic variants and disease phenotypes. This has improved clinician trust and facilitated regulatory approval of AI-driven diagnostic tools [103]. XAI models have enhanced the discovery of biomarkers by highlighting the most relevant features of multi-omics data. For instance, SHAP (SHapley Additive exPlanations) values are used to rank the importance of genetic and epigenetic markers in predicting disease outcomes, guiding targeted therapeutic development [104].

The integration of XAI into genomic applications is essential to address the black-box nature of traditional ML models. Future research should focus on designing interpretable models that provide actionable insights, such as highlighting the most relevant genetic features contributing to disease prediction [105, 106]. [106].

Techniques such as SHAP, LIME (Local Interpretable Model-Agnostic Explanations), and integrated gradients are increasingly used to provide insight into feature importance, allowing researchers to understand which genes, SNPs, or regulatory elements influence predictions. These methods are essential in clinical genomics, where explainability fosters trust and facilitates biological discovery. However, most XAI tools were originally developed for generic tabular or image data and may not capture the hierarchical and interdependent nature of genomic information. Genomic data often involves intricate biological networks, nonlinear gene interactions, and complex regulatory effects that challenge the assumptions of many XAI methods [107, 108].

Future research should focus on developing fairness-aware algorithms that can detect and mitigate biases in training datasets. Techniques such as adversarial debiasing, resampling, and reweighting ensure that ML predictions are equitable across diverse demographic groups. Multi-omics integration is critical for understanding complex disease mechanisms. Future studies should focus on hybrid ML models that combine genomic, transcriptomic, proteomic, and epigenomic data to provide a holistic view of biological systems [98].

Table 8 provides a detailed overview of the significant advancements in GBPM, categorizing them by technology, applications, benefits, and examples across various subcategories.

Table 8.

Detailed overview of advancements in GBPM

Category Subcategory Technology/Method Applications Benefits Examples/Details
NGS technologies Enhanced sequencing SMRT, nanopore Whole genome sequencing High accuracy, speed Resolves structural variations
Clinical applications NGS platforms Diagnostics, personalized therapies Increased diagnostic yield Cardiomyopathies, Mitochondrial Disorders
Multi-omics data integration Comprehensive insights Multi-omics (Genomic, etc.) Disease mechanism analysis Holistic view of biological systems Reveals gene-protein-metabolite interactions
ML integration GNNs, autoencoders Biomarker identification Improves accuracy of disease classification Predicts disease phenotypes
ML in diagnostics Variant prediction DeepVariant, PolyPhen Mutation classification High precision Classifies genetic mutations
Diagnostic yield improvement Ensemble learning Rare disease diagnosis Better performance in complex cases Identifies disease-causing mutations
Therapeutics and drug development Target identification CRISPR-Cas9, base editing Gene editing Precise, safe Optimizes guide RNA design
Drug repurposing ML Drug discovery Reduces development time and cost Suggests new therapeutic uses for existing drugs
Real-time genomic analysis High-performance computing Distributed computing Large dataset analysis Enables real-time processing Analyzes terabytes of data within hours
Clinical decision-making Rapid WGS Neonatal intensive care diagnostics Timely interventions Diagnoses genetic disorders in critically ill infants
XAI in genomics Trust and adoption XAI AI model interpretability Improves clinician trust Explains genetic variant-disease links
Biomarker discovery SHAP values Biomarker identification Guides targeted therapeutic development Ranks importance of genetic markers
Emerging technologies Genome editing CRISPR, ML Treating genetic disorders Precise editing protocols Used in sickle cell and Duchenne muscular dystrophy
Single-cell genomics ML Cellular heterogeneity analysis Identifies rare cell populations Personalized therapies in cancer and immunology

Large language models for genomics

Large Language Models (LLMs), which have demonstrated exceptional performance in natural language processing, remain underexplored in the domain of genomic sequence analysis. Unlike traditional ML approaches that rely heavily on handcrafted features and domain-specific encodings, LLMs offer a unique capacity to model complex dependencies and long-range interactions in biological sequences. For instance, models like DNABERT treat DNA sequences as a language, applying bidirectional transformers to k-mer tokenized input and learning context-aware representations for genomic tasks such as promoter identification and splice site detection [109]. Despite their promise, the integration of LLMs in genomics has been limited, often overshadowed by conventional convolutional or recurrent models. This gap presents a fertile area for research, particularly as sequence databases grow exponentially and the need for interpretable, high-capacity models intensifies.

Emerging models like the Nucleotide Transformer, which scales to billions of parameters and has been pretrained on massive nucleotide datasets, demonstrate the feasibility of adapting large-scale transformer architectures for biological applications [110]. These models not only capture biologically relevant patterns without supervision but also exhibit strong transferability across diverse tasks such as variant effect prediction, chromatin accessibility modeling, and evolutionary conservation estimation. Introducing a dedicated subsection on LLMs for genomic analysis would significantly enrich the discourse on advanced computational approaches in bioinformatics. As the field transitions toward more data-intensive and hypothesis free frameworks, LLMs offer a compelling paradigm shift moving from manually curated pipelines to end-to-end learnable systems capable of extracting nuanced biological insight.

Limitations

Despite significant advancements in GBPM, several limitations have hindered its full realization. These obstacles span the technical, ethical, regulatory, and practical domains, thereby affecting the adoption and effectiveness of ML and genomic technologies. This section explores these limitations in detail, highlighting the complexities and limitations of implementing precision implementation.

Data-related limitations

One of the primary challenges of GBPM is the limited availability of high-quality and diverse genomic datasets. Rare genetic disorders inherently suffer from small sample sizes, which restricts the ability of ML models to generalize effectively. This lack of diversity not only exacerbates healthcare disparities but also reduces the global applicability of ML-driven insights. Addressing data imbalance issues requires a concerted effort to curate diverse and representative dataset, alongside techniques such as data augmentation and synthetic data generation to balance the training data [86, 111].

The quality of genomic data can vary significantly, with issues such as sequencing errors, missing values, and noise affecting the model reliability. Variability in sequencing platforms, data pre-processing pipelines, and annotation standards introduces heterogeneity, which complicates data integration and analysis [112].

Algorithmic and computational limitations

Analysis of high-dimensional genomic data requires substantial computational resources, including high-performance computing systems and optimized algorithms. This demand increases further when integrating multi-omics data or scaling analyses to population-level datasets. Institutions with a limited computational infrastructure, particularly in low- and middle-income countries, face significant barriers to participating in genomic research [113].

Many ML models, particularly DL architectures, have been criticized for their lack of interpretability. These black-box models provide predictions without explaining the underlying reasoning, which is particularly problematic in clinical settings, where decisions must be transparent and justifiable. The inability to interpret model outputs can erode the trust between clinicians and patients, limiting adoption [114].

Ethical and privacy concerns

Genomic data contain highly sensitive information that can reveal an individual’s identity, disease risk, and familial relationships. The risk of data breaches or unauthorized access can have far-reaching implications, including discrimination in the employment, insurance, and social contexts. Ensuring robust data protection is critical but challenging given the increasing sophistication of cyber threats [73].

The participants may not have fully understood the implications of sharing genomic information, particularly in the context of long-term storage or future use by third parties. These ethical concerns necessitate greater transparency and dynamic consent models that allow participants to update their preferences over time [115].

Clinical integration limitations

The participants may not have fully understood the implications of sharing genomic information, particularly in the context of long-term storage or future use by third parties. These ethical concerns necessitate greater transparency and dynamic consent models that allow participants to update their preferences over time [116].

Although ML models can identify patterns and associations in genomic data, translating these findings into actionable clinical decisions is challenging. Many genomic variants identified using ML lack sufficient evidence to guide treatment, resulting in a gap between research and clinical practice. Bridging this gap requires interdisciplinary collaboration and validation studies to establish clinical utility [117].

Regulatory and policy limitations

The absence of standardized guidelines for validating and approving ML models in genomics poses a major regulatory hurdle. Existing frameworks often fail to address the unique challenges of ML applications such as algorithmic transparency, reproducibility, and adaptation to evolving data. Regulatory frameworks vary widely across countries, which creates challenges for international collaboration. For example, data-sharing across borders is often constrained by different privacy laws and compliance requirements. This lack of harmonization impedes the global scalability of [118].

Limitations in research and development

Although multi-omics approaches offer a comprehensive view of biological systems, their integration remains technically challenging. The lack of standardized formats, analytical pipelines, and computational tools hampers the effective combination of diverse data types such as genomics, transcriptomics, and proteomics. These limitations limit the ability to uncover complex disease mechanisms and identify possible therapeutic targets [119].

Genomic sequencing and downstream analyses remain expensive despite decreasing costs. This financial barrier limits access to precision medicine in resource-constrained settings and prevents healthcare inequities. Funding for genomic research and infrastructure development is critical for overcoming these barriers [120].

Addressing challenges and moving forward

Genomic sequencing and downstream analyses remain expensive despite decreasing costs. This financial barrier limits access to precision medicine in resource-constrained settings and prevents healthcare inequities. Funding for genomic research and infrastructure development is critical for overcoming these barriers [121, 122].

Future advancements should focus on the development of interpretable ML models tailored to genomic applications. XAI techniques such as feature attribution and visualization tools can improve transparency and trust. Additionally, fairness-aware algorithms that mitigate biases in training data are essential for ensuring equitable outcomes [123].

Global regulatory bodies must collaborate to establish harmonized frameworks for genomic data protection and ML model validation. Dynamic consent models that allow participants to adjust their preferences over time should also be adopted to enhance ethical transparency [124].

Table 9 summarizes the key challenges, examples, solutions, and references in genomic data analysis, highlighting barriers and actionable strategies for advancing genomic research and precision medicine.

Table 9.

Challenges, examples, solutions, and references in genomic data analysis (2022–2025)

Category Challenge Examples Solutions Ref
Data scarcity Limited access to diverse genomic datasets; imbalance in variant representation Rare disease datasets; bias towards European ancestry Build globally diverse datasets; Use data augmentation and synthetic data generation techniques. [125]
Data quality Variability in sequencing platforms and errors in data Inconsistent variant calling between platforms Standardize sequencing protocols; Enhance preprocessing and noise-reduction algorithms. [126]
Computational demands Need for high-performance infrastructure for analyzing genomic data High costs in low-resource settings Develop optimized ML models; Leverage distributed and cloud computing. [127]
Model transparency Difficulty in interpreting “black-box” ML models DL predictions lacking clinical rationale Focus on XAI techniques; Incorporate visualization tools and feature attribution. [128]
Ethical concerns Privacy risks and challenges in informed consent Data breaches; unclear secondary use policies Implement dynamic consent models; Strengthen encryption and access control measures. [129]
Clinical integration Barriers in embedding genomic insights into healthcare workflows Incompatibility with EHR systems; lack of clinician expertise Upgrade EHR systems; Train healthcare professionals in genomic medicine. [130]
Global disparities Variability in data-sharing policies and standards across countries Conflicting privacy regulations, limited collaboration Harmonize global frameworks; Encourage international genomic research initiatives. [131]
High costs Financial barriers in genomic sequencing and research Limited adoption in resource-constrained settings Promote cost-efficient sequencing; Increase public and private funding for genomic research. [132]

Challenges and future directions

This section explores the key barriers and offers detailed insights into the directions for future research and applications.

Key challenges

  1. Data Scarcity and Quality Issues: Rare genetic disorders, by their very nature, affect a small fraction of the population, making it difficult to obtain large, high-quality datasets. ML models require substantial training data for robustness and accuracy [1].

    Moreover, the quality of genomic data is often compromised by noise, missing values, or inconsistencies in the sequencing methods. Low-quality data can introduce inaccuracies into the ML models, reducing their predictive power, and limiting their clinical applicability [60].

  2. Ethical and Privacy Concerns: Ethical concerns arise around the collection, storage, and use of such data, particularly when participants may not fully understand how their information will be used [133].

    In addition, regulatory frameworks such as GDPR and HIPAA impose strict privacy requirements that conflict with the need for collaborative genomic research. Balancing the need for research and the protection of individual rights is an ongoing challenge, particularly as ML applications grow more sophisticated and demand larger datasets [81].

  3. Computational Complexity and Scalability: Genomic data are inherently high dimensional, with millions of genetic variants per individual. This complexity increases exponentially when integrating multi-omics data or analyzing population-scale datasets. Training ML models on such data requires significant computational resources, including high-performance computing clusters and optimized algorithms. Small institutions or developing regions may lack access to these resources, creating disparities in their genomic research capabilities [134].

    Current algorithms often struggle to provide timely results without compromising accuracy. Furthermore, as datasets grow in size, the storage and processing requirements become increasingly challenging, highlighting the need for scalable solutions [135].

  4. Model Interpretability and Trust: Many ML models, particularly DL architectures, are black-box models, which provide predictions without clear explanations. In clinical settings where trust and transparency are paramount, this lack of interpretability hinders adoption. Clinicians must understand why a model makes a particular prediction to ensure that the recommendation aligns with the medical knowledge and practices [136].

Table 10 presents a systematic comparison of the key challenges, methodologies, and outcomes across studies, underscoring the significant advancements in genomic research.

Table 10.

Comparison of key challenges, methods, and results from different studies

Key Applications Key Challenges Addressed Methods Used Results Ref
Improved genomic data preprocessing Data scarcity and noise in genomic datasets Advanced preprocessing techniques for handling noise and missing values Improved predictive accuracy in genomic analysis [137]
Ethical and secure genomic data sharing Ethical and privacy concerns in genomic research GDPR-compliant data sharing frameworks and anonymization techniques Enhanced data protection and ethical compliance [133]
Efficient analysis of large genomic datasets Computational complexity of high-dimensional genomic data High-performance computing and optimized algorithms Reduced computational time with high accuracy [138]
Enhanced clinical decision-making using AI Model interpretability in clinical applications Development of interpretable ML models and XAI methods Increased clinician trust and regulatory compliance [139]
Multi-omics research in personalized medicine Integration of multi-omics data Multi-modal data harmonization using advanced integration techniques Improved clinical applicability and multi-omics insights [140]

Future directions

  1. Development of Universal and Diverse Genomic Dataset: Future studies should focus on creating large, diverse, and representative genomic datasets. International collaborations such as the Global Alliance for Genomics and Health (GA4GH) can help establish standardized protocols for data collection, storage, and sharing [141].

    Advancements in data augmentation techniques can mitigate data scarcity. Synthetic data generation using generative adversarial networks is a promising approach for creating realistic genomic datasets while preserving privacy [142].

  2. Ethical and Secure Data Sharing Frameworks: Federated learning represents a key innovation in addressing privacy concerns while enabling collaborative research. In this approach, ML models are trained locally on decentralized data, with only the model updates shared among institutions [42].

    Blockchain technology offers another solution by providing immutable access to data and usage records. By creating transparent and auditable trails, blockchain can enhance trust among stakeholders, while preventing unauthorized data modifications [94].

  3. Advancements in XAI and Trustworthy Models: XAI is critical for bridging the gap between ML models and clinical practice. Future research should focus on developing XAI techniques that are specifically tailored for genomic applications. Feature attribution methods, which highlight genetic variants contributing to a prediction, can help clinicians understand the rationale behind model outputs [143].

  4. Integration of Multi-Omics and Real-Time Analysis: The integration of multi-omics data, including genomics, transcriptomics, and proteomics, will be the cornerstone of future genomic research. Hybrid ML models capable of analyzing diverse data type will provide a more comprehensive understanding of disease mechanisms [98].

  5. Global Collaboration and Interdisciplinary Research: Addressing the challenges of ML for GBPM requires collaboration across disciplines, including bioinformatics, ML, clinical medicine, and policy. Interdisciplinary teams can bridge the gap between technological innovation and real-world applications [144].

  6. Regulatory and Governance Frameworks: Global regulatory bodies must collaborate to establish standardized guidelines for ML applications in genomics. These frameworks should address issues such as transparency, reproducibility, and fairness while balancing innovation with ethical considerations [145].

Figure 8 compares the current focus and future potential across six key directions in genomic research and ML [14, 146–149].

Figure 8.

It compares current research focus and anticipated future potential across six key directions in genomic precision medicine. While areas like XAI and multi-omics integration already receive high attention, all categories show a notable gap, indicating greater future prioritization. Regulatory frameworks and global collaboration, currently underemphasized, are projected to gain significant strategic importance.

Comparison of current focus and future potential.

Recommendations

This section outlines the key recommendations for addressing current challenges and advancing the field.

Enhancing data diversity and quality

One of the most pressing requirements of GBPM is the development of large, diverse, and globally representative genomic datasets. Current datasets are heavily biased toward populations of European ancestry, limiting the generalizability of the ML models to other ethnic groups. Future initiatives should prioritize the inclusion of underrepresented populations to ensure equitable healthcare outcomes [150].

Advancing ML methodologies

A major limitation arises from the tendency of XAI frameworks to oversimplify feature attributions in highly non-linear models such as deep neural networks and ensemble classifiers. These models can capture complex, non-additive interactions among features, but explanation methods like SHAP or LIME may attribute importance in a linear or additive fashion, which can lead to misleading or biologically implausible interpretations. For instance, a gene may appear important in isolation according to an XAI metric but may only have functional relevance in the context of a pathway or regulatory module. Moreover, explanations are often sensitive to model perturbations and lack consistency across different training instances, raising concerns about their robustness and reproducibility in high-stakes domains like precision medicine. Addressing these limitations will require the development of genomics-aware XAI methods that can model complex interactions and reflect underlying biological mechanisms more faithfully [13, 107, 151].

Strengthening ethical and privacy frameworks

Obtaining informed consent is a dynamic process that must evolve with the expanding applications of genomic data. Future frameworks should adopt dynamic consent models allowing participants to update their preferences when new research applications emerge [136].

Obtaining informed consent is a dynamic process that must evolve with the expanding applications of genomic data. Future frameworks should adopt dynamic consent models allowing participants to update their preferences when new research applications emerge [95].

Improving computational infrastructure and scalability

The growing scale of genomic data necessitates advancements in high-performance computing. Distributed computing frameworks, cloud-based platforms, and optimized algorithms are critical for managing large-scale datasets and performing real-time genomic analyses. Investments in computational infrastructure will ensure that institutions worldwide participate in precision medicine research.

Automated data pre-processing, variant annotation, and ML pipeline execution can significantly reduce the time required for genomic analysis. Future pipelines should incorporate real-time processing capabilities to enable applications such as rapid neonatal diagnostics or time-sensitive cancer therapies [152].

Integration with emerging technologies

The integration of ML with genome-editing technologies like CRISPR- Cas9 has immense potential for therapeutic innovation. ML models can optimize guide RNA design, predict off-target effects, and identify novel gene editing targets. Future research should explore these synergies to accelerate the development of precision therapeutics [153]. Single-cell sequencing technology provides unparalleled insights into cellular heterogeneity. ML models should be developed to analyze single-cell datasets to enable the identification of rare cell populations and their roles in disease progression. These advancements can revolutionize personalized medicine, particularly in oncology and immunotherapy [154]. One notable tool is scVI (single-cell Variational Inference), which uses a deep generative model based on variational autoencoders to model the gene expression distribution across cells. It provides a robust latent space representation while correcting for batch effects and capturing biological variation [155]. Another emerging technique is scGNN, which introduces a graph neural network-based approach for learning gene-cell relationships, thereby enhancing clustering accuracy and data imputation [156]

Figure 9 highlights the weighted contributions across key areas in GBPM, emphasizing data diversity, advanced ML methodologies, ethical frameworks, computational tools, global collaboration, and interdisciplinary education [157–160].

Figure 9.

It presents the total weighted contributions of various domains in advancing genomic ML initiatives. ML methodologies and ethical/privacy frameworks emerge as the most influential categories. Meanwhile, areas like interdisciplinary education and global collaboration show room for greater integration to support holistic innovation.

Emphasizing total contributions across categories.

Discussion and interpretation of results

This section synthesizes the key takeaways from this review, highlighting the progress made, obstacles encountered, and future directions required to advance this field.

Summary of key advancements

The advent of NGS and multi-omics integration has been pivotal in enabling GBPM. These technologies have provided unprecedented resolution for understanding the genetic and molecular basis of diseases. Clinical applications, such as rapid whole-genome sequencing for neonatal disorders and personalized cancer therapies, exemplify the transformative impact of these advancements.

ML has emerged as a critical enabler in precision medicine, enhancing our ability to analyze high-dimensional genomic data, identify pathogenic variants, and predict disease phenotypes. The integration of ML with explainable XAI techniques has improved the interpretability of predictions, fostering trust and adoption in clinical settings. ML-driven innovations in drug repurposing and therapeutic target identification have accelerated the development of personalized treatments.

The path forward

The future of GBPM lies in its ability to deliver equitable healthcare outcomes. This requires concerted efforts to build global data-sharing frameworks, harmonize regulatory standards, and ensure access to genomic technologies in low- and middle-income countries. Collaborative initiatives such as international genomic consortia and public-private partnerships play a pivotal role in achieving these goals. Integrating emerging technologies, such as CRISPR-based genome editing, single-cell sequencing, and federated learning, with ML holds immense promise for advancing precision medicine.

Future advancements in ML methodologies, including fairness-aware algorithms, multi-modal data integration, and XAI will enhance the accuracy, interpretability, and reliability of genomic analyses. These innovations will ensure that ML models are scientifically robust and ethically and socially responsible.

Conclusion

By facilitating more precise diagnosis, customized therapies, and quicker drug development, ML is propelling revolutionary advancements in GBPM for rare genetic diseases. ML has greatly enhanced disease classification and biomarker identification through the development of models that can analyze complicated and high-dimensional genomic data. However, real-world implementation remains constrained by issues such as unbalanced and small datasets, computing complexity, ethical dilemmas, and model comprehensibility. To overcome these obstacles and develop trust in healthcare environments, explainable AI must be integrated into strong data-sharing frameworks and fairness-aware algorithms. This review emphasizes the necessity of cross-disciplinary collaboration to address these issues, and presents a thorough research roadmap that outlines practical strategies for transforming ML developments into impactful and equitable precision healthcare.

Key Points

  • Present a comprehensive and up-to-date assessment of ML’s current status in GBPM.

  • Seeks to address literature gaps, focusing on real-world challenges and ethical considerations.

  • Identify knowledge and practice gaps preventing ML from reaching its full potential in this area.

  • The review serves as a roadmap for future research by proposing actionable recommendations.

  • Presents a comprehensive and up-to-date assessment of machine learning’s ML’s current status in genome-based precision medicine

  • Seeks to address literature gaps, focusing on real-world challenges and ethical considerations

  • Identifies knowledge and practice gaps preventing ML from reaching its full potential in this area

  • Serves as a roadmap for future research by proposing actionable recommendations.

Acknowledgments

This research was supported by the SungKyunKwan University and the BK21 FOUR(Graduate School Innovation) funded by the Ministry of Education(MOE, Korea) and National Research Foundation of Korea(NRF). It is also supported by National Research Foundation (NRF) grants funded by the Ministry of Science and ICT (MSIT) and Ministry of Education (MOE), Republic of Korea (NRF[2021R1-I1A2(059735)], RS[2024-0040(5650)], RS[2024-0044(0881)], and RS[2019-II19(0421)]).

Contributor Information

Syed Raza Abbas, Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.

Zeeshan Abbas, Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea.

Arifa Zahir, Department of Biomedical and Robotics Engineering, Incheon National University, Songdo, Republic of Korea.

Seung Won Lee, Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea; Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea; Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea; Department of Family Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, 29 Saemunan-ro, Jongno-gu Seoul 03181, Republic of Korea.

Conflict of interest

No competing interest is declared.

References

  • 1. Hong  J, Lee  D, Hwang  A. et al.  Rare disease genomics and precision medicine. Genomics Inform  2024;22:1–11. 10.1186/s44342-024-00032-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Choon  YW, Choon  YF, Nasarudin  NA. et al.  Artificial intelligence and database for ngs-based diagnosis in rare disease. Front Genet  2024;14:1258083. 10.3389/fgene.2023.1258083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Tahir  M, Naeem  A, Malik  H. et al.  Dscc_net: multi-classification deep learning models for diagnosing of skin cancer using dermoscopic images. Cancers  2023;15:2179. 10.3390/cancers15072179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Roman-Naranjo  P, Parra-Perez  AM, Lopez-Escamez  JA. A systematic review on machine learning approaches in the diagnosis of rare genetic diseases. medRxiv  2023;2023. [DOI] [PubMed] [Google Scholar]
  • 5. Quazi  S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol  2022;39:120. 10.1007/s12032-022-01711-1 [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 6. Wang  R, Helbig  I, Edmondson  AC. et al.  Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform  2023;24:bbad284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wang  YC, Wu  Y, Choi  J. et al.  Computational genomics in the era of precision medicine: applications to variant analysis and gene therapy. J Pers Med  2022;12:175. 10.3390/jpm12020175 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Lee  J, Liu  C, Kim  J. et al.  Deep learning for rare disease: a scoping review. J Biomed Inform  2022;135:104227. 10.1016/j.jbi.2022.104227 [DOI] [PubMed] [Google Scholar]
  • 9. Mao  D, Liu  C, Wang  L. et al.  AI-marrvel—a knowledge-driven AI system for diagnosing mendelian disorders. NEJM AI  2024;1. 10.1056/AIoa2300009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wu  D, Yang  J, Liu  C. et al.  Gestaltmml: enhancing rare genetic disease diagnosis through multimodal machine learning combining facial images and clinical texts. ArXiv  2024. [Google Scholar]
  • 11. Kim  J, Wang  K, Weng  C. et al.  Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. The American Journal of Human Genetics  2024;111:2190–202. 10.1016/j.ajhg.2024.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Abbas  Z, Rehman  MU, Tayara  H. et al.  m5C-Seq: machine learning-enhanced profiling of RNA 5-methylcytosine modifications. Comput Biol Med  2024;182:109087. 10.1016/j.compbiomed.2024.109087 [DOI] [PubMed] [Google Scholar]
  • 13. Abbas  Z, Rehman  MU, Tayara  H. et al.  ORI-explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion. Bioinformatics  2023;39:btad664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Kim  HH, Kim  DW, Woo  J. et al.  Explicable prioritization of genetic variants by integration of rule-based and machine learning algorithms for diagnosis of rare mendelian disorders. Hum Genomics  2024;18:28. 10.1186/s40246-024-00595-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Naeem  A, Anees  T, Khalil  M. et al.  Snc_net: skin cancer detection by integrating handcrafted and deep learning-based features using dermoscopy images. Mathematics  2024;12:1030. [Google Scholar]
  • 16. Global Cybersecurity Network . Cyber security of genomic data. In:Global Cybersecurity Network Blog, 2024.
  • 17. Schaefer  J, Lehne  M, Schepers  J. et al.  The use of machine learning in rare diseases: a scoping review. Orphanet J Rare Dis  2020;15:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Mollaysa  A, Allam  A, Krauthammer  M. Attention-based multi-task learning for base editor outcome prediction. 2023. https://arxiv.org/abs/2310.02919
  • 19. Mondello  A, Dal Bo  M, Toffoli  G. et al.  Machine learning in onco-pharmacogenomics: a path to precision medicine with many challenges. Front Pharmacol  2024;14:1260276. 10.3389/fphar.2023.1260276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. KPMG . A new era of precision medicine. 2023. https://kpmg.com/kpmg-us/content/dam/kpmg/pdf/2023/kpmg-ai-and-precision-medicine.pdf
  • 21. Aburass  S, Dorgham  O, Al Shaqsi  J. A hybrid machine learning model for classifying gene mutations in cancer using LSTM, BiLSTM, CNN, GRU, and GloVe. Syst Soft Comput  2024;6:200110. 10.1016/j.sasc.2024.200110 [DOI] [Google Scholar]
  • 22. Abbas  Z, Tayara  H, Kil To Chong . Alzheimer’s disease prediction based on continuous feature representation using multi-omics data integration. Chemom Intel Lab Syst  2022;223:104536. 10.1016/j.chemolab.2022.104536 [DOI] [Google Scholar]
  • 23. Rony  MAT, Johora  FT, Thalji  N. et al.  Innovative approach to detecting autism spectrum disorder using explainable features and smart web application. Mathematics  2024;12:3515. 10.3390/math12223515 [DOI] [Google Scholar]
  • 24. Jafar  A, Abidin  ZU, Naqvi  RA. et al.  Unmasking colorectal cancer: a high-performance semantic network for polyp and surgical instrument segmentation. Eng Appl Artif Intel  2024;138:109292. 10.1016/j.engappai.2024.109292 [DOI] [Google Scholar]
  • 25. Mitta  NR. Machine learning for personalized medicine: tailoring treatment strategies based on individual patient data. Hong Kong J AI Med  2023;3:49–87. [Google Scholar]
  • 26. Han  R, Yoon  H, Kim  G. et al.  Revolutionizing medicinal chemistry: the application of artificial intelligence (AI) in early drug discovery. Pharmaceuticals  2023;16:1259. 10.3390/ph16091259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Serrano  DR, Luciano  FC, Anaya  BJ. et al.  Artificial intelligence (ai) applications in drug discovery and drug delivery: revolutionizing personalized medicine. Pharmaceutics  2024;16:1328. 10.3390/pharmaceutics16101328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Friedman  AB, Delgado  MK, Weissman  GE. Artificial intelligence for emergency care triage—much promise, but still much to learn. JAMA Netw Open  2024;7:e248857–7. 10.1001/jamanetworkopen.2024.8857 [DOI] [PubMed] [Google Scholar]
  • 29. Tahernejad  A, Sahebi  A, Abadi  ASS. et al.  Application of artificial intelligence in triage in emergencies and disasters: a systematic review. BMC Public Health  2024;24:3203. 10.1186/s12889-024-20447-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Ogunsakin  OL, Anwansedo  S. Leveraging ai for healthcare administration: streamlining operations and reducing costs. IRE Journals 2024;7:235–244. [Google Scholar]
  • 31. Murdoch  B. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med Ethics  2021;22:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Chin  MH, Afsar-Manesh  N, Bierman  AS. et al.  Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care. JAMA Netw Open  2023;6:e2345050–0. 10.1001/jamanetworkopen.2023.45050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Abbas  SR, Abbas  Z, Zahir  A. et al.  Federated learning in smart healthcare: A comprehensive review on privacy, security, and predictive analytics with iot integration. In Healthcare, 12, 2587. MDPI, 2024, 10.3390/healthcare12242587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Alsaafin  A, Safarpoor  A, Sikaroudi  M. et al.  Learning to predict rna sequence expressions from whole slide images with applications for search and classification. Commun Biol  2023;6:304. 10.1038/s42003-023-04583-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Li  Q, Yang  X, Xu  J. et al.  Early prediction of alzheimer’s disease and related dementias using real-world electronic health records. Alzheimers Dement  2023;19:3506–18. 10.1002/alz.12967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Lee  M. Deep learning techniques with genomic data in cancer prognosis: a comprehensive review of the 2021–2023 literature. Biology  2023;12:893. 10.3390/biology12070893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Ali  MU, Hussain  SJ, Zafar  A. et al.  WBM-DLNets: wrapper-based metaheuristic deep learning networks feature optimization for enhancing brain tumor detection. Bioengineering  2023;10:475. 10.3390/bioengineering10040475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Patwekar  M, Patwekar  F, Sanaullah  S. et al.  Harnessing artificial intelligence for enhanced parkinson’s disease management: pathways, treatment, and prospects. Trends Immunother  2023;7:2395. 10.24294/ti.v7.i2.2395 [DOI] [Google Scholar]
  • 39. Malla  R, Viswanathan  S, Makena  S. et al.  Revitalizing cancer treatment: exploring the role of drug repurposing. Cancers  2024;16:1463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Knight  DR, Aakre  CA, Anstine  CV. et al.  Artificial intelligence for patient scheduling in the real-world health care setting: a metanarrative review. Health Policy Technol  2023;12:100824. 10.1016/j.hlpt.2023.100824 [DOI] [Google Scholar]
  • 41. Ranganathan  CS, Basavaraddi  CCS, Saillaja  V. et al.  Dynamic patient triage optimization in healthcare settings using RNNs for decision support. In: 2024 10th International Conference on Smart Computing and Communication (ICSCC), pp. 370–5. IEEE, 2024. [Google Scholar]
  • 42. Nadella  GS, Satish  S, Meduri  K. et al.  A systematic literature review of advancements, challenges and future directions of ai and ML in healthcare. Int J Mach Learn Sustain Dev  2023;5:115–30. [Google Scholar]
  • 43. Ahammed  MF, Labu  MR. Privacy-preserving data sharing in healthcare: advances in secure multiparty computation. J Med Health Stud  2024;5:37–47. 10.32996/jmhs.2024.5.2.4 [DOI] [Google Scholar]
  • 44. CRISPR Medicine News . OpenCRISPR-1: Generative AI Meets CRISPR. CRISPR Medicine News, 2024. [Google Scholar]
  • 45. Bhat  AA, Nisar  S, Mukherjee  S. et al.  Integration of crispr/cas9 with artificial intelligence for improved cancer therapeutics. J Transl Med  2022;20:534. 10.1186/s12967-022-03765-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. SciTech Daily . Creme: A New Ai-Powered Virtual Lab to Help Cure Genetic Diseases. SciTech Daily, 2024. [Google Scholar]
  • 47. Bennett  R, Hemmati  M, Ramesh  R. et al.  Artificial intelligence and machine learning in precision health: an overview of methods, challenges, and future directions. Dyn Dis  2024;15:15–53. [Google Scholar]
  • 48. Udegbe  FC, Ebulue  OR, Ebulue  CC. et al.  Precision medicine and genomics: a comprehensive review of it-enabled approaches. Int Med Sci Res  2024;4:509–20. 10.51594/imsrj.v4i4.1053 [DOI] [Google Scholar]
  • 49. Liao  J, Li  X, Gan  Y. et al.  Artificial intelligence assists precision medicine in cancer treatment. Front Oncol  2023;12:998222. 10.3389/fonc.2022.998222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Brown  P, van Voorst  R. The influence of artificial intelligence within health-related risk work: A critical framework and lines of empirical inquiry. Health Risk Soc  2024;26:301–16. 10.1080/13698575.2024.2412374 [DOI] [Google Scholar]
  • 51. MacEachern  SJ, Forkert  ND. Machine learning for precision medicine. Genome  2021;64:416–25. 10.1139/gen-2020-0131 [DOI] [PubMed] [Google Scholar]
  • 52. Australian  T. Rare Diseases Receive Huge Genomics Boost. The Australian, 2024. [Google Scholar]
  • 53. Asleh  K, Dery  V, Taylor  C. et al.  Extracellular vesicle-based liquid biopsy biomarkers and their application in precision immuno-oncology. Biomarker Res  2023;11:99. 10.1186/s40364-023-00540-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Ahmed  F, Lee  JW, Samantasinghar  A. et al.  Speropredictor: An integrated machine learning and molecular docking-based drug repurposing framework with use case of covid-19. Front Public Health  2022;10:902123. 10.3389/fpubh.2022.902123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Wei  L, Niraula  D, Gates  ED. et al.  Artificial intelligence (ai) and machine learning (ml) in precision oncology: A review on enhancing discoverability through multiomics integration. Br J Radiol  2023;96:20230211. 10.1259/bjr.20230211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. DeGroat  W, Mendhe  D, Bhusari  A. et al.  Intelligenes: a novel machine learning pipeline for biomarker discovery and predictive analysis using multi-genomic profiles. Bioinformatics  2023;39:btad755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Kumar  S. Early disease detection using AI: a deep learning approach to predicting cancer and neurological disorders. Int J Sci Res Manage  2025;13:2136–55. 10.18535/ijsrm/v13i04.mp02 [DOI] [Google Scholar]
  • 58. Kanjanasurat  I, Tenghongsakul  K, Purahong  B. et al.  CNN–RNN network integration for the diagnosis of covid-19 using chest x-ray and ct images. Sensors  2023;23:1356. 10.3390/s23031356 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Song  M, Jung  H, Lee  S. et al.  Diagnostic classification and biomarker identification of alzheimer’s disease with random forest algorithm. Brain Sci  2021;11:453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Albrecht  S, Sprang  M, Andrade-Navarro  MA. et al.  Seqqscorer: Automated quality control of next-generation sequencing data using machine learning. Genome Biol  2021;22:1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Zhang  Z, Chen  L, Zhong  F. et al.  Graph neural network approaches for drug-target interactions. Curr Opin Struct Biol  2022;73:102327. 10.1016/j.sbi.2021.102327 [DOI] [PubMed] [Google Scholar]
  • 62. Abate  C, Decherchi  S, Cavalli  A. Graph neural networks for conditional De novo drug design. Wiley Interdiscip Rev Comput Mol Sci  2023;13:e1651. [Google Scholar]
  • 63. Mohammadzadeh-Vardin  T, Ghareyazi  A, Gharizadeh  A. et al.  Deepdra: Drug repurposing using multi-omics data integration with autoencoders. PLoS One  2024;19:e0307649. 10.1371/journal.pone.0307649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Pan  X, Lin  X, Cao  D. et al.  Deep learning for drug repurposing: methods, databases, and applications. Wiley Interdiscip Rev Comput Mol Sci  2022;12:e1597. [Google Scholar]
  • 65. Liu  M, Shen  X, Pan  W. Deep reinforcement learning for personalized treatment recommendation. Stat Med  2022;41:4034–56. 10.1002/sim.9491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Deshmukh  R. Reinforcement learning in healthcare: applications for personalized treatment planning and clinical decision support. Shodh Sagar J Artif Intell Mach Learn  2024;1:19–24. [Google Scholar]
  • 67. Menacho  C, Okawa  S, Alvarez-Merz  I. et al.  Deep learning-driven neuromorphogenesis screenings identify repurposable drugs for mitochondrial disease  bioRxiv. 2024.
  • 68. Wang  W, Hu  Y, Fu  F. et al.  Advancement in multi-omics approaches for uterine sarcoma. Biomarker Res  2024;12:129. 10.1186/s40364-024-00673-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Lan  W, Liao  H, Chen  Q. et al.  Deepkegg: A multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform  2024;25:bbae185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Trust  W. Data and Diversity in Genomics: A Global Landscaping Report. Wellcome, 2024. [Google Scholar]
  • 71. Wellcome Trust . Data and Diversity in Genomics: Landscaping Report  2024. Accessed: 2025-01-03.
  • 72. Ghebrehiwet  I, Zaki  N, Damseh  R. et al.  Revolutionizing personalized medicine with generative ai: A systematic review. Artif Intell Rev  2024;57:1–41. [Google Scholar]
  • 73. National Human Genome Research Institute . Privacy in genomics. 2024.
  • 74. Online Ethics Center . Case: big data & genetic privacy: Re-identification of anonymized data. 2024. Accessed: 2025-01-03
  • 75. Yaghi  T. Genetic Data in Jeopardy: Unraveling the Details of the Aftermath of 23andMe Hack  2024. Accessed: 2025-01-03.
  • 76. Blum  K. When Hospital Ransomware Attacks Threaten Patient Safety: A New Trend to Follow  2024. Accessed: 2025-01-01.
  • 77. León  A, Pastor  O. Enhancing precision medicine: A big data-driven approach for the management of genomic data. Big Data Res  2021;26:100253. 10.1016/j.bdr.2021.100253 [DOI] [Google Scholar]
  • 78. Venkatesaramani  R, Malin  BA, Vorobeychik  Y. Re-identification of individuals in genomic datasets using public face images. Science. Advances  2021;7. 10.1126/sciadv.abg3296 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Compagnucci  Corralesi, Fenwick  M. A multidisciplinary perspective on cross-border health data transfers. Forthcoming  2024. [Google Scholar]
  • 80. Pulivarti  R, Pulivarti  R, Martin  N. et al.  Cybersecurity of genomic data. In:Technical Report. US Department of Commerce, National Institute of Standards and Technology, 2023. [Google Scholar]
  • 81. Mann  SP, Treit  PV, Geyer  PE. et al.  Ethical principles, constraints, and opportunities in clinical proteomics. Mol Cell Proteomics  2021;20:100046. 10.1016/j.mcpro.2021.100046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Vadapalli  S, Abdelhalim  H, Zeeshan  S. et al.  Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Brief Bioinform  2022;23:bbac191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Karim  MR, Islam  T, Lange  C. et al.  Adversary-aware multimodal neural networks for cancer susceptibility prediction from multiomics data. IEEE Access  2022;10:54386–409. 10.1109/ACCESS.2022.3175816 [DOI] [Google Scholar]
  • 84. Tschider  C, Compagnucci  MC, Minssen  T. The new eu–us data protection framework’s implications for healthcare. J Law Biosci  2024;11. 10.1093/jlb/lsae022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Zhang  M, Sankaranarayanapillai  M, Du  J. et al.  Machine learning-based donor permission extraction from informed consent documents. BMC Bioinf  2023;24:477. 10.1186/s12859-023-05568-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Landry  LG, Ali  N, Williams  DR. et al.  Lack of diversity in genomic databases is a barrier to translating precision medicine research into practice. Health Aff  2018;37:780–5. 10.1377/hlthaff.2017.1595 [DOI] [PubMed] [Google Scholar]
  • 87. Skovorodnikov  H, Fimba  HA. Evaluating the robustness of ai in genomics via feature importance adversarial attacks  arXiv preprint arXiv:2401.10657. 2024.
  • 88. Kolobkov  D, Mishra Sharma  S, Medvedev  A. et al.  Efficacy of federated learning on genomic data: a study on the Uk biobank and the 1000 genomes project. Front Big Data  2024;7:1266031. 10.3389/fdata.2024.1266031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. Harshita  S, Adhitya  CJ, Naidu  PG. et al.  Genetic privacy shields: A dna steganography approach for multi-level text encryption: Unveiling the future of genetic data protection. In: 2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI), pp. 1–8. IEEE, 2023. [Google Scholar]
  • 90. Arshad  S, Arshad  J, Khan  MM. et al.  Analysis of security and privacy challenges for dna-genomics applications and databases. J Biomed Inform  2021;119:103815. 10.1016/j.jbi.2021.103815 [DOI] [PubMed] [Google Scholar]
  • 91. Abudalou  M. Enhancing data security through advanced cryptographic techniques. Int J Comput Sci Mob Comput 2024;13:88–92. [Google Scholar]
  • 92. Zhang  S, Kim  A, Liu  D. et al.  Genie: A secure, transparent sharing and services platform for genetic and health data  arXiv preprint arXiv:1811.01431. 2018.
  • 93. Blindenbach  J, Kang  J, Hong  S. et al.  Squid: ultra-secure storage and analysis of genetic data for the advancement of precision medicine. Genome Biol  2024;25:1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Gudodagi  R, Reddy  RVS, Reddy  RVS. Encryption and Decryption of Secure Data for Diverse Genomes. In International Conference on Artificial Intelligence and Sustainable Engineering: Select Proceedings of AISE 2020, 1,  505–514,Singapore,2022. Springer Nature Singapore, 10.1007/978-981-16-8542-2_41 [DOI] [Google Scholar]
  • 95. Calvino  G, Peconi  C, Strafella  C. et al.  Federated learning: Breaking down barriers in global genomic research. Genes  2024;15:1650. 10.3390/genes15121650 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Jia  H, Tan  S, Zhang  YE. Chasing sequencing perfection: marching toward higher accuracy and lower costs. Genomics Proteomics Bioinf  2024;22:qzae024. 10.1093/gpbjnl/qzae024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Heliö  K, Cicerchia  M, Hathaway  J. et al.  Diagnostic yield of genetic testing in a multinational heterogeneous cohort of 2088 dcm patients. Front Cardiovasc Med  2023;10:1254272. 10.3389/fcvm.2023.1254272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Wekesa  JS, Kimwele  M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Front Genet  2023;14:1199087. 10.3389/fgene.2023.1199087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Poplin  R, Chang  PC, Alexander  D. et al.  A universal snp and small-indel variant caller using deep neural networks. Nat Biotechnol  2018;36:983–7. 10.1038/nbt.4235 [DOI] [PubMed] [Google Scholar]
  • 100. Borfitz  D. AI System for Diagnosing Rare Diseases and Solving Medical Cold Cases  2024.
  • 101. Bora  A, Prof. Banupriya G.  CRISPR-Cas9 technology: applications and trends in bioinformatics and machine learning. 2024.
  • 102. El-Atawneh  S, Goldblum  A. A machine learning algorithm suggests repurposing opportunities for targeting selected gpcrs. Int J Mol Sci  2024;25. 10.3390/ijms251810230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Kose  U, Sengoz  N, Chen  X. et al.  Explainable Artificial Intelligence (XAI) in Healthcare. CRC Press, Boca Raton, 2024. 10.1201/9781003426073 [DOI] [Google Scholar]
  • 104. Saraswat  D, Bhattacharya  P, Verma  A. et al.  Explainable AI for healthcare 5.0: opportunities and challenges. IEEE Access  2022;10:84486–517. [Google Scholar]
  • 105. Markov  ML. Lime vs shap. 2023. https://www.markovml.com/blog/lime-vs-shap
  • 106. Zhou  Z, Hu  M, Salcedo  M. et al.  XAI meets biology: a comprehensive review of explainable AI in bioinformatics applications  arXiv preprint arXiv:2312.06082. 2023.
  • 107. Johnsen  PV, Riemer-Sørensen  S, DeWan  AT. et al.  A new method for exploring gene–gene and gene–environment interactions in GWAS with tree ensemble methods and shap values. BMC Bioinf  2021;22:1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Rao  Y, Zhang  L, Gao  L. et al.  ExAutoGP: enhancing genomic prediction stability and interpretability with automated machine learning and SHAP. Animals (Basel)  2025;15:1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Ji  Y, Zhou  Z, Liu  H. et al.  Dnabert: Pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics  2021;37:2112–20. 10.1093/bioinformatics/btab083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Dalla-Torre  G, Meier  J, Bühlmann  P. et al.  Nucleotide transformer: Building and evaluating robust foundation models for human genomics. Nat Commun  2023;14:3185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Alyass  A, Turcotte  M, Meyre  D. From big data analysis to personalized medicine for all: Challenges and opportunities. BMC Med Genomics  2015;8:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112. Ma  X, Shao  Y, Tian  L. et al.  Analysis of error profiles in deep next-generation sequencing data. Genome Biol  2019;20:1–15. 10.1186/s13059-019-1659-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113. Olbrich  M, Bartels  L, Wohlers  I. Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research. Front Bioinf  2024;4:1384497. 10.3389/fbinf.2024.1384497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114. Parisineni  SRA, Pal  M. Enhancing trust and interpretability of complex machine learning models using local interpretable model agnostic shap explanations. Int J Data Sci Anal  2024;18:457–66. 10.1007/s41060-023-00458-w [DOI] [Google Scholar]
  • 115. Bruns  A, Winkler  EC. Dynamic consent: a royal road to research consent?  J Med Ethics  2024.  jme-2024-110153. 10.1136/jme-2024-110153 [DOI] [PubMed] [Google Scholar]
  • 116. Kho  AN, Rasmussen  LV, Connolly  JJ. et al.  Practical challenges in integrating genomic data into the electronic health record. Genet Med  2013;15:772–8. 10.1038/gim.2013.131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117. Couckuyt  A, Seurinck  R, Emmaneel  A. et al.  Challenges in translational machine learning. Hum Genet  2022;141:1451–66. 10.1007/s00439-022-02439-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118. National Human Genome Research Institute . Nhgri machine learning in genomics workshop: tools, resources, clinical applications, and ethics. Exec Sum  2021. [Google Scholar]
  • 119. Mohr  AE, Ortega-Santos  CP, Whisner  CM. et al.  Navigating challenges and opportunities in multi-omics integration for personalized healthcare. Biomedicines  2024;12:1496. 10.3390/biomedicines12071496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120. Genomic Medicine Network . The cost of genomics: economic challenges in genetic medicine. 2024. https://www.genomicmedicinenetwork.com/news/the-cost-of-genomics-economic-challenges-in/-genetic-medicine/
  • 121. Page  A, Haendel  M, Freimuth  RR. A community approach to standards development: the global alliance for genomics and health (GA4GH). In Genomic Data Sharing, 71–90. Academic Press, 2023. 10.1016/B978-0-12-819803-2.00011-0 [DOI] [Google Scholar]
  • 122. Thorogood  A, Rehm  HL, Goodhand  P. et al.  International federation of genomic medicine databases using ga4gh standards. Cell Genomics  2021;1:100032. 10.1016/j.xgen.2021.100032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123. Toussaint  PA, Leiser  F, Thiebes  S. et al.  Explainable artificial intelligence for omics data: a systematic mapping study. Brief Bioinform  2024;25:bbad453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124. Mirakhori  F, Niazi  SK. Harnessing the ai/ml in drug and biological products discovery and development: The regulatory perspective. Pharmaceuticals  2025;18:47. 10.3390/ph18010047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125. Future directions in genomics and health equity. 2022.
  • 126. Whole-genome alignment . Methods, challenges, and future directions. MDPI. Appl Sci  2023;14:4837. [Google Scholar]
  • 127. Machine learning applications for therapeutic tasks with genomics data. Patterns 2021;2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128. Clearing the fog . A scoping literature review on the ethical issues. J Pers Med  2023;14:443. 10.3390/jpm14050443 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129. Protecting reproductive health data: State laws against geofencing. 2025.
  • 130. Genome uk . Shared commitments for UK-wide implementation 2022 to 2025. 2025.
  • 131. Universal declaration on bioethics and human rights. 2024.
  • 132. Genomics delivery plan for wales. 2022.
  • 133. Kuo  TT, Jiang  X, Tang  H. et al.  The evolving privacy and security concerns for genomic data analysis and sharing as observed from the idash competition. J Am Med Inform Assoc  2022;29:2182–90. 10.1093/jamia/ocac165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Yang  J, Wang  L, Liu  L. et al.  Graphpca: A fast and interpretable dimension reduction algorithm for spatial transcriptomics data. Genome Biol  2024;25:287. 10.1186/s13059-024-03429-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135. Tutorials  O. Mining Massive Biological Datasets for New Discoveries with Scalable Algorithms. 2023.
  • 136. Hassija  V, Chamola  V, Mahapatra  A. et al.  Interpreting black-box models: a review on explainable artificial intelligence. Cogn Comput  2024;16:45–74. 10.1007/s12559-023-10179-8 [DOI] [Google Scholar]
  • 137. Askin  S, Burkhalter  D, Calado  G. et al.  Artificial intelligence applied to clinical trials: opportunities and challenges. Health Technol  2023;13:203–13. 10.1007/s12553-023-00738-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138. Siganporia  A, Varia  M, Gorimar  U. et al.  Privacy-enhanced federated learning for rare genetic disorder classification with EHR. In: 2023 Global Conference on Information Technologies and Communications (GCITC), pp. 1–8. IEEE, 2023. [Google Scholar]
  • 139. Yang  L, Tian  M, Xin  D. et al.  AI-driven anonymization: protecting personal data privacy while leveraging machine learning  arXiv preprint, arXiv:2402.17191. 2024.
  • 140. Groft  SC, Posada  M, Taruscio  D. Progress, challenges and global approaches to rare diseases. Acta Paediatr  2021;110:2711–6. 10.1111/apa.15974 [DOI] [PubMed] [Google Scholar]
  • 141. Zhou  T, Zhou  Y. Enhancing model generalisability through sampling diverse and balanced retinal images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 678–88, Cham, 2024. Springer Nature Switzerland, 10.1007/978-3-031-72378-0_63 [DOI] [Google Scholar]
  • 142. Pun  FW, Liu  BHM, Long  X. et al.  Identification of therapeutic targets for amyotrophic lateral sclerosis using pandaomics–an AI-enabled biological target discovery platform. Front Aging Neurosci  2022;14:914017. 10.3389/fnagi.2022.914017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143. GxP-CC . Explainable ai in life sciences: Understanding the ‘why’. 2023. Accessed: 2025-01-03.
  • 144. Bertagnolli  MM. Advancing health through artificial intelligence/machine learning: the critical importance of multidisciplinary collaboration. PNAS Nexus  2023;2:pgad356. [Google Scholar]
  • 145. Joly  Y, Dove  E, Knoppers  BM. et al.  The GA4GH regulatory and ethics work stream (rews) at 10: An interdisciplinary, participative approach to international policy development in genomics. In:The Law and Ethics of Data Sharing in Health Sciences, pp. 13–32. Singapore: Springer Nature Singapore, 2023. [Google Scholar]
  • 146. Sigala  RE, Lagou  V, Shmeliov  A. et al.  Machine learning to advance human genome-wide association studies. Genes  2023;15:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147. Sobczyk  MK, Gaunt  TR, Paternoster  L. Mendelvar: gene prioritization at GWAS loci using phenotypic enrichment of mendelian disease genes. Bioinformatics  2021;37:1–8. 10.1093/bioinformatics/btaa1096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148. Jacobsen  JO, Kelly  C, Cipriani  V. et al.  Evaluation of phenotype-driven gene prioritization methods for mendelian diseases. Brief Bioinform  2022;23. 10.1093/bib/bbac188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149. Wang  J, Dayem Ullah  AZ, Chelala  C. IW-scoring: an integrative weighted scoring framework for annotating and prioritizing genetic variations in the noncoding genome. Nucleic Acids Res  2018;46:e47. 10.1093/nar/gky057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150. HealthIT.gov . Sync for genes phase 5 final report. 2023. https://www.healthit.gov/sites/default/files/page/2023-05/SyncforGenesPhase5_Final∖%20Report_508.pdf
  • 151. Bi  Y, Xiang  D, Ge  Z. et al.  An interpretable prediction model for identifying n7-methylguanosine sites based on XGboost and shap. Mol Ther Nucleic Acids  2020;22:362–72. 10.1016/j.omtn.2020.08.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152. PTP Cloud . Exploring the power of high-performance computing (hpc) for life sciences. 2023. https://ptp.cloud/exploring-the-power-of-high-performance-/computing-hpc-for-life-sciences/. Accessed: 2025-01-03
  • 153. Labiotech.eu . Crispr and ai: How the technologies are transforming medicine. 2023. https://www.labiotech.eu/in-depth/crispr-ai/
  • 154. Macaulay  IC, Voet  T. Single cell genomics: Advances and future perspectives. PLoS Genet  2014;10:e1004126. 10.1371/journal.pgen.1004126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155. Xiong  J, Gong  F, Ma  L. et al.  scVIC: seep generative modeling of heterogeneity for scRNA-seq data. Bioinf Adv  2024;4:vbae086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156. Wang  J, Ma  A, Chang  Y. et al.  Scgnn is a novel graph neural network framework for single-cell rna-seq analyses. Nat Commun  2021;12:1882. 10.1038/s41467-021-22197-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157. Ozcelik  F, Dundar  MS, Yildirim  AB. et al.  The impact and future of artificial intelligence in medical genetics and molecular medicine: An ongoing revolution. Funct Integr Genomics  2024;24:138. 10.1007/s10142-024-01417-9 [DOI] [PubMed] [Google Scholar]
  • 158. Precision Medicine . AI-driven precision medicine gains momentum in 2024. Precision medicine. Online  2024. [Google Scholar]
  • 159. The Australian . Magic Medicine? The Revolution in Genes and Health. The Australian, 2024. [Google Scholar]
  • 160. The Atlantic . The most Important Breakthroughs of 2024. The Atlantic, 2024. [Google Scholar]

Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES