Skip to main content
BioMedicine logoLink to BioMedicine
. 2025 Mar 1;15(1):6–22. doi: 10.37796/2211-8039.1643

Mini-review of clinical data service platforms in the era of artificial intelligence: A case study of the iHi data platform

Yu-Ting Lin a,b,1, Ya-Chi Lin a,b,1, Hung-Lin Chen a,b, Che-Chen Lin a, Min-Yen Wu a, Sheng-Hsuan Chen a, Zi-Han Lin a, Yi-Ching Chang a, Chuan-Hu Sun a, Sheng-Ya Lu a, Min-Yu Chiang a, Hui-Chao Tsai a, Mei-Ju Shih a, David Ray Chang c,d, Fuu-Jen Tsai e, Hsiu-Yin Chiang a,b,*, Chin-Chi Kuo a,b,c,e,**
PMCID: PMC11959964  PMID: 40176862

Abstract

In the past two decades, healthcare organizations have transitioned from the early stages of digitization and digitalization to a more comprehensive process of digital transformation, a shift significantly accelerated by the advent of artificial intelligence (AI). Consequently, the development of high-quality clinical data warehouses, derived from electronic health records (EHRs) and enriched with multidomain data, such as genomics, proteomics, and Internet of Things (IoT) information, has become essential for the creation of the modern patient digital twin (PDT). This approach is critical for leveraging AI in the evolving landscape of clinical practice. Leading medical centers and healthcare institutions have adopted this model, as summarized in this review.

Since 2020, China Medical University Hospital (CMUH) has been constructing its data ecosystem by integrating EHRs with extensive genomic databases. This initiative has led to the development of a data service platform, the ignite Hyper-intelligence (iHi®) platform. The iHi platform serves as a case study exemplifying the workflow of the smart data chip, which facilitates the deep cleaning and reliable de-identification of clinical data while incorporating analytical platforms related to genomics and the microbiome to enhance insight extraction processes. The ability to predict complex interactions and disease trajectories among PDTs, digital counterparts of healthcare professionals, and virtual socioeconomic environments will be pivotal in advancing personalized healthcare and optimizing patient outcomes. Future challenges will involve the unification of cross-institutional data platforms and ensuring the interoperability of AI inferences—key factors that will define the next era of AI-driven healthcare.

Keywords: Artificial intelligence, Data ecosystem, Data platforms, Digital twin, iHi platform

1. Introduction

Healthcare data serves as the cornerstone of artificial intelligence (AI) development in the modern era [13]. However, the prevalence of fragmented healthcare data silos—an inherent limitation of contemporary electronic health records (EHRs)—poses significant challenges. These silos contribute to the formation of clinical AI service barriers, impeding the development of a sustainable AI service ecosystem and complicating the creation of an integrated AI landscape. Despite the importance of addressing this issue, systematic evaluations of healthcare data governance and the quality of data service ecosystems, particularly in terms of clinical applications and enterprise adoption, remain scarce [4]. While several leading medical centers have developed proprietary data ecosystems to support advancements in clinical AI, none have tackled the resolution of fragmented AI deployment within the framework of data intelligence. As a result, AI silos persist, leading to considerable inefficiencies in research and development, which often fail to inform policy-making or drive transformations in care models [5].

In response to these challenges, the Big Data Center (BDC) of China Medical University Hospital (CMUH) launched the ignite Hyper-intelligence (iHi®) platform in 2020. This initiative aims to create a solid infrastructure for the development and validation of healthcare AI technologies, while facilitating improved data-human interaction and the integration of AI solutions into daily practice. In addressing the common issue of “data hunger” in the healthcare AI development ecosystem, the platform focuses on the automatic preparation of high-quality datasets for healthcare professionals, thus efficiently meeting critical needs. By establishing a “health learning triangle” that emphasizes the transition from data to actionable insights and, ultimately, to clinical practice, the iHi platform enhances the sustainability of the AI ecosystem across the healthcare sector. This model fosters a unique AI ecosystem learning triangle, linking the healthcare industry, policymaking, and scientific discovery (Fig. 1).

Fig. 1.

Fig. 1

The Dual-Triangle Model of the iHi Data Platform: Enhancing AI Sustainability in Healthcare. The inner triangle illustrates the core components—Data, Insight, and Practice— highlighting the transformation of data into actionable insights and their subsequent application in clinical practice. This foundational structure underpins the broader objectives represented by the outer triangle, which encompasses the needs of Government, Academia, and Industry. Together, these interconnected elements enable the platform to generate innovative insights, address the dynamic requirements of healthcare, and foster cross-sector collaboration.

This review examines recent advancements in modern data service platforms and evaluates how their data strategies align with our double-triangle data learning model. The primary aim of this article is to explore the potential for global unification of data service platforms, a critical step toward dismantling the persistent cycle of AI service silos in clinical practice. This, in turn, would support the transition toward a global digital twin initiative, paving the way for the incorporation of foundational model concepts [6,7].

2. Evolution of EHRs in Taiwan and United States

The evolution of Electronic Health Records (EHRs) from simple digital replacements for paper charts to sophisticated systems supporting a wide range of clinical and administrative functions marks a significant advancement in healthcare technology [810]. The first prototype of an EHR, designed as a clinical information system, was developed by Lockheed in the mid-1960s. Around the same period, the University of Utah created an early clinical decision support system. The U.S. government began adopting EHRs in the 1970s, initially through a pilot program within the Veterans Affairs’ Decentralized Hospital Computer Program. After a decade of research, the Institute of Medicine recommended the widespread adoption of EHRs to enhance patient safety, collaborating with HL7, an international nonprofit organization, to establish standards for information exchange and operational functions within EHR systems.

The adoption of EHRs in the United States gained significant momentum in 2009, when they were recognized as a key element of President Obama’s Health Information Technology for Economic and Clinical Health (HITECH) Act, which accelerated their implementation across U.S. hospitals. The digitization of healthcare rapidly expanded throughout the 2010s (Fig. 2). Although the United States led the early adoption of EHRs, the involvement of three distinct parties—government, vendors, and payers—resulted in the development of separate, non-integrated systems. As a consequence, communication between these systems remains inadequate, hindering full interoperability across various EHR platforms.

Fig. 2.

Fig. 2

Comparative timelines and key milestones in the evolution of electronic health records (EHRs) in the United States and Taiwan, alongside the concurrent development of the internet and artificial intelligence (AI). The milestones are represented by dot circles, with the red dot indicating the emergence of data service platforms. The orange and green lines depict the trends of paper-based medical records and EHRs, respectively. The red shaded area highlights the trend toward the establishment of EHR-based data service platforms.

In contrast, the adoption of EHRs in Taiwan has progressed more slowly than in the United States, largely due to its close integration with the development of the National Health Insurance (NHI) system. This relationship has facilitated the creation of a robust IT infrastructure that supports interoperability across EHR systems. In 2000, following the achievement of a 100% electronic declaration rate through the NHI system, Taiwan’s Department of Health launched a nationwide EHR initiative. A pivotal milestone occurred in 2011 with the establishment of the Electronic Medical Record (EMR) Exchange Center, coordinated by the Healthcare Certification Authority Network [11], which contributed to the standardization of national EHR interoperability.

Significant progress in interoperability was made in 2016 with the implementation of the NHI Cloud-Based Inquiry System for Medical Care Information, enabling the retrieval of 12 types of clinical data, including laboratory results, surgical records, vaccination histories, and medical images from CT and MRI scans [12]. Currently, Taiwan’s Ministry of Health and Welfare is advancing an international EHR ecosystem by adopting Fast Healthcare Interoperability Resources (FHIR) [13].

The disparities in EHR development between the United States and Taiwan are multifaceted, shaped by differences in healthcare system structures and policy frameworks. In Taiwan, the level of interoperability is closely tied to reimbursement practices. For instance, hospitals experience reduced reimbursement rates if they fail to upload clinical data to the NHI Cloud. When effectively utilized, national EHR databases have the potential to serve as the foundation for developing patient digital twins (PDTs), which would document the individual disease trajectories of patients.

3. EHRs evolving into data service platforms in the age of AI

The unprecedented demand for AI applications in real-world settings has positioned electronic health records (EHRs) as the cornerstone of today’s health intelligence ecosystem. The primary strength of EHRs lies in their capacity to function as comprehensive repositories of patient data—encompassing demographics, medical history, vital signs, laboratory results, and imaging studies—provided that the institutional infrastructure can effectively track the patient’s data trajectory within its own EHR system [14]. When meaningfully cleaned, this rich dataset offers fertile ground for AI development, enabling algorithms to analyze disease patterns, predict patient outcomes, and support clinical decision-making [1,15].

Developing economies such as India are making substantial investments in national digital health missions, with EHRs at their core. These initiatives aim to establish interoperable systems that enhance health care delivery and improve patient outcomes [16]. Developed economies like the United States and the United Kingdom are also advancing, with data service platforms such as Stanford Data Science Resources (SDSR), Mount Sinai Data Warehouse (MSDW), and the UK’s General Practice Research Database (GPRD) facilitating large-scale, data-driven innovations in personalized medicine, advancing translational research, and supporting intelligent clinical decision-making [1719]. The future potential of data intelligence can be maximized when EHRs are further integrated with national biobanks based on general populations, such as the All of Us initiative, the UK Biobank, and the Taiwan Biobank [2022].

Similar to current pharmacological standards, where drug approvals rely on multicenter clinical trials to ensure generalizability, AI- and data-driven digital solutions must undergo a comparable clearance process [23]. A critical challenge in this context is the fragmentation of data within and across healthcare systems [24], further complicated by legacy issues and incompatible information technology infrastructures [25]. Overcoming these challenges is essential to fully harness data intelligence and AI in medicine and public health. Another major challenge involves heterogeneous data quality, stemming from inconsistent data governance strategies and variations in common data models, such as the Observational Medical Outcomes Partnership, as well as issues related to missing values or incomplete data entries due to regional practice differences and inadequate follow-up [26,27].

Despite growing global interest in AI applications, the medical field continues to face delays in data preparation. Increasing awareness underscores that data is the essential fuel required to unlock AI’s potential, from disease prognosis prediction to liability control, positioning EHRs at the core of the clinical AI ecosystem.

4. Data services in leading research platforms

The adoption of data services by prominent U.S. institutions accelerated in the 2010s, with leaders like the Mayo Clinic, Stanford University, and Mount Sinai Medical Center integrating electronic health records (EHRs) as primary repositories for translational research [17,18,28]. This shift paralleled the rapid advancement of artificial intelligence (AI) technologies, yet the emphasis on AI applications sometimes overshadowed the essential role of foundational data intelligence needed to drive meaningful AI insights. Each platform represents unique data governance strategies aimed at reducing data silos and enhancing data quality. This article offers a concise overview of the core characteristics of major platforms, including the Mayo Clinic Platform, the Mount Sinai Data Warehouse (MSDW), the Stanford Data Science Resources (SDSR), and the General Practice Research Database/Clinical Practice Research Datalink (GPRD/CPRD).

4.1. Mayo Clinic Platform

The Mayo Clinic Platform integrates data from over 10 million patients, comprising 10.4 million records. This comprehensive dataset includes 1.6 billion laboratory test results, 10 million pathology reports, 698 million clinical notes, and more than 400 million medical images. Among the images are 241 million CT scans, 146 million MRI scans, 10.2 million electrocardiograms, and 2.5 million positron emission tomography scans [28]. All clinical data are de-identified in accordance with Health Insurance Portability and Accountability Act (HIPAA) regulations, ensuring that both structured and unstructured information is anonymized to protect patient privacy [29].

Built on this secure infrastructure, Mayo Clinic, in collaboration with Google, has developed a cloud-based Clinical Data Analytics Platform to enable secure data sharing. This platform allows third parties to develop and validate novel algorithms and conduct data analyses while ensuring that data remains securely within Mayo Clinic’s environment, never leaving the Mayo container [30]. This approach has unlocked the commercial potential of the Mayo Clinic Platform, leading to the launch of the Mayo Accelerate Program, which partners with global startups to test and validate their AI-driven or data intelligence-based digital solutions.

4.2. Mount Sinai Data Warehouse (MSDW)

MSDW collects clinical and operational data from the Epic Electronic Health Record (EHR) system at Mount Sinai Health System, granting researchers access to over 11 million patient records and more than 87 million patient encounters since its inception in 2011 [18]. The Mount Sinai Health System, which includes facilities such as Mount Sinai Hospital, Queens, West, Morningside, Brooklyn, and Beth Israel, transitioned to the Epic EHR system between 2011 and 2020. Along with other ancillary systems, Epic serves as the primary data source for MSDW. To facilitate optimal data sharing and interoperability both internally and externally, clinical data are extracted and transformed into the Observational Medical Outcomes Partnership (OMOP) common data model. The dataset is updated daily, ensuring the availability of current data for research and operational purposes.

MSDW operates on the Minerva High-Performance Computing (HPC) cluster, which also supports other research datasets [31]. Minerva has been HIPAA-compliant since October 1, 2020, ensuring the secure storage and processing of protected health information.

4.3. Stanford Data Science Resources (SDSR)

The Stanford Medicine Data Science Resource (SDSR) provides access to large-scale datasets through key repositories, including the Stanford Medicine Research Data Repository (STARR), the Stanford Cancer Institute Research DataBase (SCIRDB), and the Population Health Sciences (PHS) Data Portal. STARR contains over two decades of patient data from Stanford Health Care and Stanford Children’s Health, with updates occurring every 24–36 h [17]. SCIRDB integrates cancer research data with national health insurance registries, while the PHS Data Portal hosts 83 population health datasets [32], making these resources essential for advancing health science research [33].

The SDSR data science ecosystem is underpinned by a HIPAA-compliant computing infrastructure that supports secure search, access, analysis, and de-identification pipelines, enabling the extraction of valuable insights from healthcare IT systems [17]. Notably, Stanford’s PHS data ecosystem was specifically developed to securely manage and share large-scale, high-risk health data covering hundreds of millions of individuals. It is accessible to researchers at Stanford University and is designed with replication capabilities for use at other institutions.

Despite the platform’s effectiveness in hosting and curating vast datasets, challenges persist in enhancing data discoverability, accessibility, and reusability, which are critical for fully supporting translational research [32]. To ensure long-term support for translational research using real-world data, the implementation of advanced technological solutions, robust management structures, and educational initiatives that foster collaboration among researchers, data scientists, and the broader community have been identified as key requirements [32].

4.4. General Practice Research Database/Clinical Practice Research Datalink (GPRD/CPRD)

The General Practice Research Database (GPRD), renamed the Clinical Practice Research Datalink (CPRD) in 2012, is a computerized repository containing anonymized patient records [19]. Since its inception in 1987, the CPRD has continuously accumulated data and currently holds information on 60 million patients from the United Kingdom, including 18 million actively registered patients [34]. The CPRD provides real-world data services that support both retrospective and prospective public health and clinical research. These services are managed by the Medicines and Healthcare Products Regulatory Agency (MHRA) with support from the National Institute for Health and Care Research (NIHR) [35,36].

The CPRD encompasses anonymized electronic health records (EHRs) from primary care practices, including patient demographics, diagnoses, medication exposures, and laboratory test results. Additionally, it integrates primary care data with other health datasets, such as hospital admissions, mortality records, cancer registries, and socioeconomic data, providing a comprehensive perspective on patient health. To maintain public trust and ensure data integrity, the CPRD has established the Research Data Governance process, which enables secure access to its data while upholding legal and ethical standards for research and healthcare purposes [37].

As institutions develop their own artificial intelligence (AI) ecosystems, driven by advanced data service platforms, a key challenge lies in data governance. One major issue is the approach to data cleaning: many medical centers adopt a decentralized model, where datasets are made available through service platforms or access-controlled tables, allowing users to work with raw data. In this model, users are often responsible for developing their own data cleaning methods to ensure data quality before analysis. Alternatively, a centralized data cleaning approach, supported by standardized codebooks, provides a more efficient solution. This method organizes high-quality, theme-based databases, enabling users to access clean and well-structured data. Furthermore, centralized cleaning facilitates the integration of multiple high-quality datasets, supporting the “data LEGO” concept, ensuring analytical accuracy, and mitigating the “garbage in, garbage out” problem.

4.5. Insufficient integration of external clinical information

One of the key challenges in health-care data management is the inadequate integration of clinical information from external sources [38]. Patient data are often dispersed across various hospitals, insurance providers, and government agencies, leading to data fragmentation, which can introduce bias into hypothesis testing and reduce the efficiency of AI model development. Data ecosystems developed by insurance payers, such as Kaiser Permanente and Geisinger, are often better integrated because they encompass data from multiple hospitals within a unified, insurance based healthcare system [39,40]. However, when patients seek care from different healthcare systems or change insurance providers, linking their medical data remains a challenge. In Taiwan, the universal healthcare system presents a unique opportunity to develop a comprehensive patient tracking environment with minimal loss to follow-up [41].

4.6. Limited scope of EHRs in PDTs

Electronic Health Records (EHRs) capture approximately 20% of a patient’s total life data, which is inadequate for constructing a comprehensive Patient Digital Twin (PDT)—a virtual representation of an individual’s health trajectory [6]. Even with the integration of genomic and other omics data, this percentage increases only marginally, reaching about 30% [7]. The majority of relevant patient data—estimated at 70%–80%—arises from daily activities such as lifestyle factors (e.g., diet, sleep, exercise) and environmental exposures. However, these factors are typically not included in traditional EHRs. This discrepancy highlights the need for standardization in incorporating data from Internet of Things (IoT) devices and app-generated health information into healthcare systems, potentially through standardized data formats like FHIR [42]. Therefore, the development of next-generation EHR systems is urgent. These systems would not only serve as foundational components for PDT construction but also facilitate the seamless integration of artificial intelligence (AI) technologies. As AI applications increasingly rely on diverse and comprehensive datasets, addressing this gap is critical for advancing personalized medicine and intelligent healthcare—a challenge that extends beyond the scope of this review.

In summary, Electronic Health Records (EHRs) have gradually transformed into advanced data service platforms that support modern clinical decision support systems incorporating artificial intelligence (AI) and data analytics since the mid-2010s. To further enhance the integration of AI in healthcare, the next critical step will be the unification of these service platforms across diverse economic and healthcare systems.

5. Overview of the iHi data platform

China Medical University Hospital (CMUH), the largest medical center in central Taiwan, has been dedicated to establishing a robust medical data ecosystem to support AI and digital infrastructure for a smart health-care system [43]. Since 2017, the Big Data Center (BDC) at CMUH has led the development of the iHi Platform, a comprehensive data ecosystem aimed at fostering hyper-intelligence in health care. This large-scale, interoperable platform integrates phenotypic, genomic, and environmental data, including 20 years of electronic medical records (EMRs) and environmental exposure data from over 3.5 million patients, along with genetic data from 400,000 individuals. Additionally, the platform is linked with National Health Insurance (NHI) data (Fig. 3).

Fig. 3.

Fig. 3

Overview of the iHi Platform: Data Quality, Security, Application, and Knowledge Translation. VDI refers to Virtual Desktop Infrastructure.

The iHi Platform operates with a “data LEGO” approach, featuring a standardized data cleaning pipeline, integrated infrastructure, an ISO-certified, regulation-compliant de-identification process, and a secure virtual working environment. These components enable web-based data exploration and support no-code multi-omics analysis, offering extensive and integrated real-world evidence. This infrastructure not only enhances the quality of medical education and supports clinical research but also promotes the sustainability of healthcare innovations.

5.1. iHi platform infrastructure for comprehensive data services

The iHi platform provides comprehensive, secure data services, emphasizing both data protection and user accessibility (Fig. 4). To meet diverse user needs, the platform offers three distinct service models. For users with data analysis expertise, iHi provides access to fully cleaned, de-identified datasets, with an option to incorporate external data. For users with research concepts but limited analytical skills, the iHi Incubator enables collaboration with data scientists from CMUH’s BDC to develop and complete research projects. Additionally, the platform includes specialized modules for genetic and microbiome analyses, allowing researchers to link phenotype, genotype, and microbiome data for advanced studies.

Fig. 4.

Fig. 4

Infrastructure and Data Workflow for the iHi Platform’s Data Services.

All services are conducted within a secure, personalized workspace through a Virtual Desktop Infrastructure (VDI), ensuring that data remain within the iHi environment. This setup allows users to securely access and analyze data remotely. All data are sourced exclusively from individuals who have provided consent, and CMUH has implemented a real-time consent revocation feature since 2022 via its patient service app, CARES.

5.2. Data de-identification

CMUH is the first hospital in Taiwan to achieve both domestic and international dual certifications for its clinical data and image database, meeting the ISO 29100 and 29191 standards as well as CNS 29191 and CNS 29100-2 standards. The iHi platform utilizes the BDC’s automated de-identification system (Fig. 5) to ensure strong data security while supporting diverse data applications. This system applies various automated de-identification techniques, including generalization, suppression, K-anonymity, and image masking, to protect sensitive health information [44]. After de-identification, a comprehensive risk assessment is performed, followed by consistency checks and public database searches to confirm that no personally identifiable information remains.

Fig. 5.

Fig. 5

ISO-Certified De-identification Workflow for the iHi Database.

To further enhance privacy protections, the platform continuously adopts new technologies to address emerging data types, such as those generated by wearable devices like smartwatches and fitness trackers. Additionally, CMUH is actively pursuing additional international certifications, including ISO/ IEC 20889:2018 (Privacy-Enhancing Data De-identification Terminology and Classification of Techniques) and ISO/IEC 27559:2022 (Information Security, Cybersecurity, and Privacy Protection - Privacy-Enhancing Data De-identification Framework).

5.3. Data cleaning and harmonization: smart data chip workflow

Data derived from routine clinical practice often contain errors and inconsistencies, making data quality—encompassing completeness, consistency, and validity—critical for constructing accurate patient digital twins (PDTs) and supporting AI-driven healthcare solutions [45]. To address these challenges, we propose a comprehensive data cleaning and harmonization framework known as the Smart Data Chip Workflow (Fig. 6). This workflow underpins the iHi platform’s database through systematic quality control processes, transforming raw clinical data into high-quality, research-ready datasets.

Fig. 6.

Fig. 6

Smart data chip workflow for data cleaning and harmonization.

The workflow begins by integrating clinical data from multiple sources to break down data silos. These data are structured within a well-defined architecture, ensuring logical consistency and proper formatting, thereby facilitating seamless integration across institutions. Domain experts, alongside various data-cleaning methodologies (e.g., outlier detection and unit consistency checks), refine the data to ensure accurate terminology mapping and the resolution of inconsistencies. To further enhance interoperability, clinical data ontologists label and classify the data, ensuring alignment with multiple formats and international frameworks, such as HL7 and FHIR.

Next, the data are organized into specific themes and modules, followed by rigorous plausibility validation to ensure consistency with clinical practices and medical standards. A series of statistical analyses and validation steps (e.g., descriptive statistics) are then conducted to verify data accuracy and reliability. Each step in this process is meticulously documented, generating a comprehensive data lineage that tracks the entire history of data processing, thereby ensuring transparency and reproducibility.

The final product, developed through the Smart Data Chip Pipeline, ensures data interoperability and expandability, akin to LEGO bricks. This allows for customization to meet specific user or client needs (Fig. 7). Users can select and combine data modules to construct research-specific databases, supporting advanced analytics, AI model development, and a wide range of clinical and research applications.

Fig. 7.

Fig. 7

iHi Platform Data Organization: Modular Data Bricks to Enhance Research Flexibility and Interoperability. Panel (A) presents a multi-level coronary artery disease (CAD) cohort, categorized into “Confirmed CAD,” “Probable CAD,” and “Possible CAD” groups, each using different combinations of data modules such as ICD codes, CAD medications, percutaneous coronary intervention (PCI), and coronary artery bypass grafting (CABG). Panel (B) illustrates a customizable Atrial Fibrillation cohort, where User A, User B, and User C build datasets by selecting modules such as ICD codes, electrocardiogram (ECG) reports, and echocardiogram (Echo) reports according to their specific research needs. This modular design supports flexible, personalized data configurations for a variety of research applications. CABG, coronary artery bypass grafting; CAD, coronary artery disease; ECG, electrocardiogram; PCI, percutaneous coronary intervention.

6. Integrated multiomics analysis platforms: genomics and microbiome

6.1. Overview of iHi genomics

While many countries have established biobanks [46], genetic research often demands specialized analytical expertise, substantial computational resources, and extensive data storage to handle large-scale genomic datasets. To address these challenges and improve data accessibility within CMUH, the iHi Platform developed iHi Genomics, an automated genetic analysis platform. This platform integrates genetic data from approximately 400,000 individuals, complemented by 19 years of longitudinal clinical data.

iHi Genomics consolidates CMUH’s institutional infrastructure into a high-performance computing environment, featuring 128 processing threads, 2 TB of RAM, and 1.2 PB of storage capacity. Through optimized task scheduling and minimizing idle resource time, the platform maximizes both performance and efficiency. This enables users to perform genetic research with no programming requirements, streamlining the process and enhancing accessibility for researchers across disciplines.

6.2. Precision Medicine Project

The CMUH Precision Medicine Project (PMP), initiated in 2018, has continuously enrolled outpatient participants in a sequential manner, with all participants providing written informed consent [47]. The PMP collects genetic data using the TWBv2 genotyping array, a Taiwanese-specific custom SNP array based on the ThermoFisher Axiom platform. This array incorporates rare coding risk alleles derived from whole-genome sequencing data of Taiwanese samples [48]. Specifically, the TWBv2 array includes 114,000 risk variants spanning 2831 rare disease genes identified in the published literature and the ClinVar database, as well as 4100 variants related to drug metabolism and adverse drug reactions. Additionally, the array features 24,865 copy number variation (CNV) probes associated with known chromosomal aberrations and CNV regions [22].

All genetic data in the iHi Genomic platform undergo rigorous quality control assessments, evaluating factors such as kinship, ancestry, missing data rates, allele frequencies, and adherence to Hardy–Weinberg equilibrium [49]. To examine gene-clinical outcome associations on a whole-genome scale, missing genotypes were imputed using the 1000 Genomes Project phase 3 East Asian panel as a reference [50]. Haplotype phasing was performed with SHAPEIT2, and imputation was conducted using IMPUTE2 [51,52], generating a total of 81,698,455 imputed SNPs. We retained imputed SNPs with an INFO score >0.7, yielding a final dataset of 22,537,475 imputed SNPs (Fig. 8). All quality control procedures were conducted using PLINKv2.0.19.

Fig. 8.

Fig. 8

A scalable genotype imputation pipeline through a hybridized cloud and local system. Genotype imputation is performed using SHAPET2 and IMPUTE2 tools. The pipeline integrates a cloud-based platform for scalable computational power and a local data platform to ensure data security and privacy.

As of 2024, the iHi Genomics cohort consists of 400,458 samples, with a median age of 46.3 years (IQR=30.1–62.3), and a male representation of 45.8%.

6.3. Standardized analytic pipelines for genome-wide association studies (GWAS) and mendelian randomization

In 2022, we launched iHi Genomics, an automated platform designed to streamline Genome-Wide Association Study (GWAS) analyses. This platform facilitates the establishment of target cohorts by using the International Classification of Diseases (ICD) coding system and enables users to control the quality of genetic data, perform zero-programming analyses, and summarize GWAS results comprehensively through visualizations and interactive tables. By using the ICD Anchor interactive dashboard, users can select cases based on ICD codes for specific diseases, with the system automatically matching appropriate controls and providing visualizations of demographic data for both cases and controls. Once the study cohort is confirmed, the platform retrieves the corresponding genetic data from iHi Genomics, processes it through stringent quality control protocols, and proceeds with GWAS analysis using the GWAS Navigator module. Within 24 h, users receive a comprehensive GWAS report that contains details of each quality control step, a Quantile–Quantile (QQ) plot, a Manhattan plot, and an interactive annotation table of statistically significant SNPs (Fig. 9). This table enables users to filter results by rsID number, p-value, chromosome position, and gene symbols, and includes links to external public databases such as GeneCards for streamlined access to gene-related information [53].

Fig. 9.

Fig. 9

User-friendly interface of iHi-Genomics. The iHi-Genomics platform consists of two functional modules: ICD Anchor (a) and GWAS Navigator (b). Each module features an intuitive user interface, enabling users to perform genome-wide association studies (GWAS) with no programming expertise required. The platform generates a comprehensive GWAS report (c), which includes detailed information on each quality control step, a Q–Q plot, a Manhattan plot, and an interactive SNP list. This well-organized report offers researchers a clear and informative overview of their GWAS analysis.

Another application of genetic data is Mendelian randomization (MR), frequently used to evaluate the causal relationships between exposures and outcomes [54], especially when randomizing exposures would be ethically or practically infeasible [55]. MR leverages genetic variants as instrumental variables to simulate randomization trials and relies on three key assumptions [56]. First, the genetic instruments must be strongly associated with the exposure. Second, these instruments should not be associated with confounding variables. Third, no horizontal pleiotropy should exist, meaning that genetic instruments influence the outcome solely through the exposure. We adopted the “TwoSampleMR” R package to establish an MR pipeline, which supports a two-sample MR study and validates the critical assumptions of the MR model [57]. For example, users can assess the association between the genetic instruments and exposure by using F-statistics [F = (R2(N – K – 1))/(1 – R2)K, where N is the sample size, K is the number of genetic variants in the instrument, and R2 is the percentage of variance in the exposure explained by the genetic instrument [56]. Typically, F-statistics >10 suggest a strong association between instruments and exposure [58]. To address potential horizontal pleiotropy, we employed MR-Egger regression, where a nonsignificant intercept value indicates no evidence of pleiotropy affecting the outcomes [59]. Furthermore, sensitivity analyses were performed using various MR methods, including inverse-variance weighted (MR-IVW), MREgger regression, weighted median (MR-median), weighted mode (MR-mode), simple-mode, and Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) [60]. Although our MR services are currently customized according to specific applicant needs, a dedicated digital platform is under development.

6.4. Overview of iHi microbiome

Microbial marker genes, particularly 16S rRNA, are essential for microbiome profiling, offering rapid and reliable insights into microbial communities [61]. However, current computational tools for microbiome analysis often require users to have programming expertise for tool integration and statistical method selection, presenting a barrier for researchers without coding experience [62]. To address this challenge, we developed iHi Microbiome, a comprehensive, flexible, and user-friendly web-based platform for microbiome analysis. This platform integrates a wide array of functionalities, including sequence data processing and advanced analysis, to meet the needs of both novice and expert researchers.

The iHi Microbiome platform offers four core functions: (1) raw sequence data processing, (2) a one-click-for-all function, (3) advanced custom analysis, and (4) miscellaneous tools (Fig. 10). The raw sequence data processing feature enables robust data analysis, quality reporting, and accurate taxonomic assignment. The one-click-for-all function provides a simplified, user-friendly solution for various microbiome analyses.

Fig. 10.

Fig. 10

The iHi-Microbiome platform offers comprehensive functionality in four main areas: (1) Raw Sequence Data Processing, ensuring robust data analysis, quality control, and accurate taxonomic assignments; (2) One-click-for-all Analysis, providing a user-friendly interface for diverse analyses, including Alpha- and Beta-diversity, microbiome composition, differential abundance, and correlation analysis; and (3) Advanced Custom Analysis, tailored to specific study requirements. The One-click-for-all Analysis includes Alpha-diversity to assess richness and evenness within samples, Beta-diversity to measure community variability across samples, and microbiome composition profiling across different habitats. The summary table also includes a cladogram displaying differentially abundant taxa across habitats, thereby enhancing comparative insights across sample groups.

Users can perform complex analyses, such as microbial community biodiversity, microbiome composition, differential abundance, and correlation analyses, by selecting appropriate methods and adjusting parameters. The advanced custom analysis and miscellaneous tools offer more sophisticated, customizable options with detailed parameter settings, addressing specific research requirements, including functional prediction, pathway analysis, and a range of statistical models and data visualization tools. iHi Microbiome thus serves as an efficient, comprehensive, and adaptable tool for microbiome research. It is a free, open-access platform, requiring no login, ensuring broad accessibility. The platform is available at https://cmuh-ihi-microbiome.nchc.org.tw/. In comparison to other microbiome analysis platforms, such as MicrobiomeAnalyst and MOCHI, which offer a wide array of tools for microbiome data analysis, the iHi Microbiome platform is distinguished by its seamless integration within the iHi data ecosystem. This integrated approach allows users to link microbiome data with clinical, genomic, and environmental datasets, thereby facilitating comprehensive, multi-dimensional analyses. Such functionality provides an expanded perspective for microbiome research, enabling deeper insights into the complex interactions between microbiome composition and various health determinants.

6.5. User interface design

The iHi Genomics and iHi Microbiome platforms are designed as streamlined, zero-programming data analytics tools aimed at reducing the learning curve for clinical researchers without bioinformatics expertise. Both platforms emphasize user-friendly, web-based dashboards that simplify navigation through complex datasets, enabling users to derive meaningful insights through intuitive operations and visually engaging data presentations. These platforms feature guided graphical user interfaces (GUIs), ensuring that all tasks can be performed without the need for coding knowledge.

6.6. Future perspectives

iHi Genomics represents Taiwan’s first large-scale platform seamlessly integrating genomic and phenotypic data. This integration holds significant potential for advancing precise disease risk assessments and predictions, guiding new drug development, and promoting innovation in clinical practice. By further connecting to environmental databases, our unique and robust data ecosystem enables researchers to comprehensively examine the interactions between genetic, environmental, and clinical factors in disease progression.

While this is an initial step toward the era of big data and AI, the iHi Platform, encompassing iHi Genomics and iHi Microbiome, provides a solid foundation for developing comprehensive patient digital twins (PDTs). This potential is amplified as diverse omics data, including transcriptomics, proteomics, and single-cell RNA sequencing, become integrated. Moreover, with the application of advanced IoT technologies, the platform could connect to real-time daily activities, creating a dynamic, Avatar-like representation of patients.

Strategically, the iHi Digital Data Ecosystem is poised to integrate with future metaverse applications, enabling seamless, real-time interactions between digital avatars of patients, healthcare professionals, and their living environments.

7. Discussion and conclusion

The data ecosystem established by the iHi Platform forms the foundation for the development of patient digital twins (PDTs) through the integration of data from National Health Insurance (NHI) records, multi-omics profiles, and environmental exposures. This integration is further enhanced by disease management apps, which link the ecosystem to real-time data generated by patients, enabling comprehensive profiling of patients’ daily activities. Despite these advancements in data governance and quality, critical gaps remain, including the lack of dynamic integration with patients’ living environments and the absence of interconnected digital twins representing other healthcare professionals and patients. These missing elements are essential for fully leveraging the potential of cutting-edge AI technologies, such as generative AI and large language models.

The vision of a Global Patient Digital Twin Initiative (GPDTI) presents the challenge of international interoperability. Unifying diverse clinical data service platforms across various healthcare systems and hospitals could revolutionize the efficiency of AI solution validation and enable the creation of digital twin-driven virtual clinical trials. However, despite HL7’s efforts to promote the global adoption of FHIR, the integration of this cross-institution communication protocol has been slow [63]. The current standards development process, led by HL7, focuses on adapting to new data types, including mobile app data, patient audio and video recordings, and advanced multi-omics data, such as microbiome and proteomics information [64,65]. However, data standards governing AI-generated biomedical data, including interactions between generative AI and patients or clinical professionals, AI-driven disease prognosis predictions, and AI-simulated videos (e.g., gait analysis), remain underdeveloped and lack universal agreement. Given the variability of AI tool accuracy across healthcare institutions, implementing standardization mechanisms prior to integrating these inferences is essential. This may involve retraining AI models using all available data or applying bias-correction approaches to these inferences. A thorough evaluation of standardization strategies is necessary before deploying data standardization protocols for AI inferences [66]. Despite these challenges, AI-derived digital biomarkers or biosignatures are critical for realizing digital twins, which are pivotal for applications such as virtual clinical trials and disease trajectory simulations within the next generation of Health Information Systems (HIS)/Electronic Health Records (EHRs).

A key element of the GPDTI is the implementation of comprehensive data collection strategies that strictly adhere to privacy-preserving practices. This poses a significant challenge for the next generation of EHR systems, which require international interoperability while complying with national and regional patient privacy laws, such as HIPAA in the United States and the General Data Protection Regulation (GDPR) in the European Union. At CMUH, the BDC adheres to regulations like Taiwan’s Personal Information Protection Act, ensuring patient confidentiality while enabling data-driven healthcare innovations through a careful balance of access and protection. Furthermore, the export of clinical data, particularly genetic data, is tightly regulated by the Taiwanese government. Therefore, the principle of “data never leaving” serves as a fundamental guideline. The iHi Platform employs Virtual Desktop Infrastructure (VDI) services to uphold this policy in international AI research collaborations, complemented by rigorous data anonymization and deidentification techniques using AI methods. However, the “data never leaving” policy, which restricts data to physical locations within Taiwan, may hinder the progress of the GPDTI, as the exchange of data at the digital twin level—especially multi-omics data—is essential for meaningful collaboration. To address this, the development of an intelligent consent management system is crucial, ensuring that patients are fully informed about the scope and purpose of data sharing with third parties, including research institutions and commercial entities. Importantly, patients must have the ability to revoke consent at any time for specific types of information, in line with GDPR principles. Additionally, all data service platforms, including iHi, should adopt stringent access controls based on the principle of least privilege, implement real-time monitoring, and maintain detailed logs of data access. Despite these safeguards, the potential for irreversible damage to patient rights and organizational reputation in the event of a data breach remains a serious concern that cannot be fully mitigated through legal action. Establishing a secure yet open data ecosystem to support the integration of AI in clinical practice will require substantial legal and regulatory efforts, especially in an era of rapid advancements in data science and AI technology.

As AI-driven systems become integral to healthcare, effective error management is essential for establishing a framework of Trustworthy AI and fostering digital trust among clinicians and patients. While AI significantly enhances diagnostic and therapeutic decision-making, it also presents potential risks associated with errors and biases originating from training data. To mitigate these risks, explainable AI tools [67], such as SHAP, play a critical role by providing transparency into model decision-making processes, thereby enhancing model reliability and enabling healthcare professionals to interpret the AI’s rationale. Furthermore, practice-aligned calibration (PAC) methods can help AI models adapt to varying clinical contexts [68], promoting consistent performance and reinforcing trust in the accuracy and safety of AI applications. Together, these strategies contribute to a secure, transparent, and ethical framework for AI-driven healthcare, which is essential as the iHi Platform advances toward broader applications in clinical practice.

In conclusion, the iHi Platform represents a significant milestone in Taiwan’s evolution of medical big data by transforming conventional EHRs into a sustainable, infinite data ecosystem. Supported by government and regulatory bodies, the iHi platform is poised to drive innovation beyond the traditional learning health system within local healthcare institutions. More broadly, it aims to establish a comprehensive data ecosystem that connects to national and global policymaking, drives transformation in the international healthcare industry, and supports robust academic applications across both basic and applied sciences, including AI in medicine and public health.

Acknowledgments

We appreciate the iHi Clinical Research Platform from the Big Data Center of CMUH for the data exploration, and administrative support.

Funding Statement

The study was funded by grants from China Medical University Hospital (grant: DMR-111-037, DMR-112-119, DMR-112-188, DMR-113-090); the National Science and Technology Council of Taiwan (grant: 111-2628-B-039-004-MY3, 111-2314-B-039-085).

Footnotes

Financial support: The funding sources did not play any role in the design or conduct of the study, collection, management, analysis, interpretation of the data, or preparation, review, or approval of the manuscript. The study was funded by grants from China Medical University Hospital (grant: DMR-111-037, DMR-112-119, DMR-112-188, DMR-113-090); the National Science and Technology Council of Taiwan (grant: 111-2628-B-039-004-MY3, 111-2314-B-039-085).

Conflict of interest: The authors declare they have nothing to disclose.

References

  • 1. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMCMed Educ. 2023;23:689. doi: 10.1186/s12909-023-04698-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Bohr A, Memarzadeh K. The rise of artificial intelligence in healthcare applications. Artif Intell Healthcare. 2020:25–60. [Google Scholar]
  • 3. Secinaro S, Calandra D, Secinaro A, Muthurangu V, Biancone P. The role of artificial intelligence in healthcare: a structured literature review. BMC Med Inf Decis Making. 2021;21:125. doi: 10.1186/s12911-021-01488-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Pavlenko E, Strech D, Langhof H. Implementation of data access and use procedures in clinical data warehouses. A systematic review of literature and publicly available policies. BMC Med Inf Decis Making. 2020;20:157. doi: 10.1186/s12911-020-01177-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Mann DM, Stevens ER, Testa P, Mherabi N. From silos to synergy: integrating academic health informatics with operational IT for healthcare transformation. NPJ Digit Med. 2024;7:185. doi: 10.1038/s41746-024-01179-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Katsoulakis E, Wang Q, Wu H, Shahriyari L, Fletcher R, Liu J, et al. Digital twins for health: a scoping review. NPJ Digit Med. 2024;7:77. doi: 10.1038/s41746-024-01073-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Sun T, He X, Li Z. Digital twin in healthcare: recent updates and challenges. Digit Health. 2023;9:20552076221149651. doi: 10.1177/20552076221149651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Shortliffe EH. The evolution of electronic medical records. Acad Med. 1999;74:414–9. doi: 10.1097/00001888-199904000-00038. [DOI] [PubMed] [Google Scholar]
  • 9. Evans RS. Electronic health records: then, now, and in the future. Yearb Med Inform. 2016;(Suppl 1):S48–61. doi: 10.15265/IYS-2016-s006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Uslu A, Stausberg J. Value of the electronic medical record for hospital care: update from the literature. J Med Internet Res. 2021;23:e26323. doi: 10.2196/26323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ministry of Health and Welfare. Healthcare Certification Authority. MOHW web site. [Accessed October 20, 2024]. https://hca.nat.gov.tw/Intro/ProjectInfo .
  • 12.Ministry of Health and Welfare. National Health Insurance Administration MediCloud System. MOHW Web site. [Accessed October 20, 2024]. https://www.nhi.gov.tw/en/cp-43-28d42-23-2.html .
  • 13. Liu TJ, Lee HT, Wu F. Building an electronic medical record system exchanged in FHIR format and its visual presentation. Healthcare (Basel) 2023;11:2410. doi: 10.3390/healthcare11172410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Janett RS, Yeracaris PP. Electronic medical records in the American health system: challenges and lessons learned. Cien Saude Colet. 2020;25:1293–304. doi: 10.1590/1413-81232020254.28922019. [DOI] [PubMed] [Google Scholar]
  • 15. Knevel R, Liao KP. From real-world electronic health record data to real-world results using artificial intelligence. Ann Rheum Dis. 2023;82:306–11. doi: 10.1136/ard-2022-222626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Pai MMM, Ganiga R, Pai RM, Sinha RK. Standard electronic health record (EHR) framework for Indian healthcare system. Health Serv Outcomes Res Method. 2021;21:339–62. [Google Scholar]
  • 17. Callahan A, Ashley E, Datta S, Desai P, Ferris TA, Fries JA, et al. The Stanford Medicine data science ecosystem for clinical and translational research. JAMIA Open. 2023;6:ooad054. doi: 10.1093/jamiaopen/ooad054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mount Sinai Data Warehouse. MSDW Data Contents. MSDW web site. [Accessed October 20, 2024]. https://labs.icahn.mssm.edu/msdw/data-sources/
  • 19. Leahy TP, Ramagopalan S, Sammon C. The use of UK primary care databases in health technology assessments carried out by the National Institute for health and care excellence (NICE) BMC Health Serv Res. 2020;20:675. doi: 10.1186/s12913-020-05529-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep pheno-typing and genomic data. Nature. 2018;562:203–9. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, Jenkins G, et al. The “all of us” research program. N Engl J Med. 2019;381:668–76. doi: 10.1056/NEJMsr1809937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Feng YA, Chen CY, Chen TT, Kuo PH, Hsu YH, Yang HI, et al. Taiwan Biobank: a rich biomedical research database of the Taiwanese population. Cell Genom. 2022;2:100197. doi: 10.1016/j.xgen.2022.100197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Kandi V, Vadakedath S. Clinical trials and clinical research: a comprehensive review. Cureus. 2023;15:e35077. doi: 10.7759/cureus.35077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Agrawal R, Prabakaran S. Big data in digital healthcare: lessons learnt and recommendations for general practice. Heredity (Edinb) 2020;124:525–34. doi: 10.1038/s41437-020-0303-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Huang C, Koppel R, McGreevey JD, 3rd, Craven CK, Schreiber R. Transitions from one electronic health record to another: challenges, pitfalls, and recommendations. Appl Clin Inform. 2020;11:742–54. doi: 10.1055/s-0040-1718535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Torab-Miandoab A, Samad-Soltani T, Jodati A, Rezaei-Hachesu P. Interoperability of heterogeneous health information systems: a systematic literature review. BMC Med Inf Decis Making. 2023;23:18. doi: 10.1186/s12911-023-02115-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. FitzHenry F, Resnic FS, Robbins SL, Denton J, Nookala L, Meeker D, et al. Creating a common data model for comparative effectiveness with the observational medical outcomes partnership. Appl Clin Inform. 2015;6:536–47. doi: 10.4338/ACI-2014-12-CR-0121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mayo Clinic Platform. Mayo Clinic Platform Discover. MCP Web site. [Accessed October 20, 2024]. https://www.mayoclinicplatform.org/discover/
  • 29. Murugadoss K, Rajasekharan A, Malin B, Agarwal V, Bade S, Anderson JR, et al. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns (N Y) 2021;2:100255. doi: 10.1016/j.patter.2021.100255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Grossmann C, Chua PS, Ahmed M, Greene SM. Sharing Health Data: the Why, the Will, and the Way Forward. Washington (DC): National Academies Press (US); 2022. [PubMed] [Google Scholar]
  • 31.Mount Sinai Data Warehouse. Mount Sinai data warehouse ecosystem. MSDW web site. [Accessed 2 October 2024]. https://labs.icahn.mssm.edu/msdw/about-us/
  • 32. Chu I, Miller R, Mathews I, Vala A, Sept L, O’Hara R, et al. AIR enough: building an academic data ecosystem to make real-world data available for translational research. J Clin Transl Sci. 2024;8:e92. doi: 10.1017/cts.2024.530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Stanford University. Stanford data science resources. SU web site. [Accessed 20 October 2024]. https://med.stanford.edu/sdsr/datasets.html .
  • 34. Edwards L, Pickett J, Ashcroft DM, Dambha-Miller H, Majeed A, Mallen C, et al. UK research data resources based on primary care electronic health records: review and summary for potential users. BJGP Open. 2023;7 doi: 10.3399/BJGPO.2023.0057. BJGPO.2023.0057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Oyinlola JO, Campbell J, Kousoulis AA. Is real world evidence influencing practice? A systematic review of CPRD research in NICE guidances. BMC Health Serv Res. 2016;16:299. doi: 10.1186/s12913-016-1562-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Law M, Couturier DL, Choodari-Oskooei B, Crout P, Gamble C, Jacko P, et al. Medicines and healthcare products regulatory agency’s “consultation on proposals for legislative changes for clinical trials”: a response from the trials methodology research partnership adaptive designs working group, with a focus on data sharing. Trials. 2023;24:640. doi: 10.1186/s13063-023-07576-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Medicines and Healthcare products Regulatory Agency. Data access. MHRA Web site. [Accessed 2 October 2024]. https://www.cprd.com/data-access .
  • 38. Bradley CJ, Penberthy L, Devers KJ, Holden DJ. Health services research and data linkages: issues, methods, and directions for the future. Health Serv Res. 2010;45:1468–88. doi: 10.1111/j.1475-6773.2010.01142.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Roblin DW, Rubenstein KB, Tavel HM, Goodrich GK, Ritzwoller DP, Certa JM, et al. Development of a common data model for a multisite and multiyear study of virtual visit implementation: a case study. Med Care. 2023;61(Suppl 1):S54–61. doi: 10.1097/MLR.0000000000001834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Kellermann AL. Coverage matters: insurance and health care. Ann Emerg Med. 2002;40:644–7. doi: 10.1067/mem.2002.129724. [DOI] [PubMed] [Google Scholar]
  • 41. Lee PC, Kao FY, Liang FW, Lee YC, Li ST, Lu TH. Existing data sources in clinical epidemiology: the taiwan national health insurance laboratory databases. Clin Epidemiol. 2021;13:175–81. doi: 10.2147/CLEP.S286572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Zeadally S, Bello O. Harnessing the power of Internet of Things based connectivity to improve healthcare. Internet Things. 2021;14:100074. [Google Scholar]
  • 43.China Medical University Hospital. CMUH Web site. [[Accessed 20 October 2024].]. https://www.cmuh.cmu.edu.tw/Home/CmuhIndex_EN?lang=1 .
  • 44. El Emam K, Dankar FK. Protecting privacy using k-anonymity. J Am Med Inform Assoc. 2008;15:627–37. doi: 10.1197/jamia.M2716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Lewis AE, Weiskopf N, Abrams ZB, Foraker R, Lai AM, Payne PRO, et al. Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc. 2023;30:1730–40. doi: 10.1093/jamia/ocad120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Lazareva TE, Barbitoff YA, Changalidis AI, Tkachenko AA, Maksiutenko EM, Nasykhova YA, et al. Biobanking as a tool for genomic research: from allele frequencies to cross-ancestry association studies. J Pers Med. 2022;12:2040. doi: 10.3390/jpm12122040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Liu TY, Lin CF, Wu HT, Wu YL, Chen YC, Liao CC, et al. Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank. Biomedicine (Taipei) 2021;11:57–65. doi: 10.37796/2211-8039.1302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Wei CY, Yang JH, Yeh EC, Tsai MF, Kao HJ, Lo CZ, et al. Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese. NPJ Genom Med. 2021;6:10. doi: 10.1038/s41525-021-00178-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Marees AT, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, et al. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int J Methods Psychiatr Res. 2018;27:e1608. doi: 10.1002/mpr.1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. 1000 Genomes Project Consortium. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10:e1004234. doi: 10.1371/journal.pgen.1004234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr Protoc Bioinformatics. 2016;54:1.30.1–1.30.33. doi: 10.1002/cpbi.5. [DOI] [PubMed] [Google Scholar]
  • 54. Chen HL, Chiang HY, Chang DR, Cheng CF, Wang CCN, Lu TP, et al. Discovery and 723 prioritization of genetic determinants of kidney function in 297,355 individuals from 724 Taiwan and Japan. Nat Commun. 2024;15:9317. doi: 10.1038/s41467-024-53516-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Cornish AJ, Tomlinson IPM, Houlston RS. Mendelian randomisation: a powerful and inexpensive method for identifying and excluding non-genetic risk factors for colorectal cancer. Mol Aspects Med. 2019;69:41–7. doi: 10.1016/j.mam.2019.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362:k601. doi: 10.1136/bmj.k601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408. doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Li P, Wang H, Guo L, Gou X, Chen G, Lin D, et al. Association between gut microbiota and preeclampsia-eclampsia: a two-sample Mendelian randomization study. BMC Med. 2022;20:443. doi: 10.1186/s12916-022-02657-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Burgess S, Thompson SG. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur J Epidemiol. 2017;32:377–89. doi: 10.1007/s10654-017-0255-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Walker VM, Davies NM, Hemani G, Zheng J, Haycock PC, Gaunt TR, et al. Using the MR-Base platform to investigate risk factors and drug targets for thousands of phenotypes. Wellcome Open Res. 2019;4:113. doi: 10.12688/wellcomeopenres.15334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Johnson JS, Spakowicz DJ, Hong BY, Petersen LM, Demkowicz P, Chen L, et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun. 2019;10:5029. doi: 10.1038/s41467-019-13036-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Hu B, Canon S, Eloe-Fadrosh EA, Anubhav, Babinski M, Corilo Y, et al. Challenges in bioinformatics workflows for processing microbiome omics data at scale. Front Bioinform. 2021;1:826370. doi: 10.3389/fbinf.2021.826370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Ayaz M, Pasha MF, Alzahrani MY, Budiarto R, Stiawan D. The Fast Health Interoperability Resources (FHIR) Standard: systematic literature review of implementations, applications, challenges, and opportunities. JMIR Med Inform. 2021;9:e21929. doi: 10.2196/21929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Falkenhein I, Bernhardt B, Gradwohl S, Brandl M, Hussein R, Hanke S. Wearable device health data mapping to Open mHealth and FHIR data formats. Stud Health Technol Inform. 2023;305:341–4. doi: 10.3233/SHTI230500. [DOI] [PubMed] [Google Scholar]
  • 65. Wu PY, Cheng CW, Kaddi CD, Venugopalan J, Hoffman R, Wang MD. -Omic and electronic health record big data analytics for precision medicine. IEEE Trans Biomed Eng. 2017;64:263–73. doi: 10.1109/TBME.2016.2573285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Steerling E, Siira E, Nilsen P, Svedberg P, Nygren J. Implementing AI in healthcare—the relevance of trust: a scoping review. Front Health Serv. 2023;3:1211150. doi: 10.3389/frhs.2023.1211150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, et al. Explainable artificial intelligence (XAI): what we know and what is left to attain Trustworthy artificial intelligence. Inf Fusion. 2023;99:101805. [Google Scholar]
  • 68. Chen YC, Chiang HY, Kuo CC. Artificial intelligence in U.S. Health care delivery. N Engl J Med. 2023;389:1442. doi: 10.1056/NEJMc2310288. [DOI] [PubMed] [Google Scholar]

Articles from BioMedicine are provided here courtesy of China Medical University

RESOURCES