. 2020 Jan 27;107(4):753–761. doi: 10.1002/cpt.1736

Table 1.

Summary of recommendations from phase I of the HMA‐EMA Joint Big Data Task Force

Promote use of global, harmonized, and comprehensive standards to facilitate interoperability of data	Minimize the number of standards; strongly support the use of available global data standards or the development of new standards in fields where none are available to ensure early alignment. Where data cannot be standardized at inception, establish the regulatory requirements to confirm the validity of mapped data. Promote use of global open source file formats.
Characterization of data quality across multiple data sources is essential to understand the reliability of the derived evidence	Characterize and document data quality in a sustainable EU inventory. Establish minimum sets of data quality standards. Where possible, quality attributes (e.g., compliance to GCP requirements should be integrated to facilitate selection of appropriate datasets for analysis). Implement data quality control measures. Establish a clear framework for the validation of innovative bioanalytical methods (e.g., ‘omics).
The development of timely, efficient and sustainable frameworks for data sharing and access is required Further support mechanisms are needed to promote a data sharing culture	Strongly recommend the establishment of distributed data networks to facilitate data sharing of sensitive healthcare data. Develop guidance for robust data governance and data anonymization to deliver systems that secure patient trust. Establish disease‐specific minimum data elements to enable harmonization of data across, for example, national disease registries. Promote mandatory sharing of the analysis arising from data sharing activities (e.g., by publication or open sharing via data access platforms). Promote the sharing of qualified models. Support the development of policy initiatives to drive a data sharing culture, which is mutually beneficial for all stakeholders. Patients should be partners in all discussions. Proactively drive and/or support data sharing platforms and initiatives. Require the submission of data management plans at the start of all data generation exercises. Establish accountability for users. Development of common principles for data anonymization to facilitate data sharing.
Promote mechanisms to enable data linkage to deliver novel insights Facilitate harmonization of similar datasets	Encourage sharing of raw data, associated metadata and processed data to enable meaningful data linkage. Proactively engage with initiatives to map terminologies to facilitate data linkage and timely data access but ensure frameworks for consistent validation are simultaneously implemented. Support mechanisms to maintain up‐to‐date mappings across terminologies. Promote the inclusion of clinical outcome data relevant to regulatory questions in public databases.
Develop clear frameworks to enable the validation of analytical approaches to determine if they are appropriate to support regulatory decision making Promote new analytical approaches for modeling of big data sets for regulatory purpose	Move the analysis to the data: actively support the development of novel analytical approaches (e.g., AI, machine learning) applicable across distributed data networks which do not require the physical transfer of data. Form an advisory group to: ○ explore the applicability of novel analytics methodologies to support the development, scientific evaluation, and monitoring of medicinal products; ○ explore the most suitable data standards and IT architecture and tools capable to enable the analyses. Promote the increased utilization of scientific advice and the EMA Qualification Advice process to enable regulators to influence more mature approaches. Support, define, and validate the definition of innovative outcome measures and other approaches, which leverage additional dimensions from high‐frequency or high‐dimensional data. Explore novel methodologies to improve the control of confounding in observational studies and other big data studies. Make publicly available data analysis plans for all studies submitted for regulatory approval. Strongly support the exploration of novel analytics approaches, such as natural language processing techniques to interrogate unstructured data. Agree and create guidelines on which level of validation, reproducibility, and trustworthiness of evidence is acceptable according to the regulatory application of the AI algorithm.
Regulatory guidance is required on the acceptability of evidence derived from big data sources	Identify the best format to enhance the agility of guidance development and revision in this fast moving field. Track concrete examples of procedures relevant to big data across the regulatory network to inform thinking. Establish pilot programs to develop informal discussion on acceptability. Initiate pilot studies to better understand the evidence generated on efficacy/effectiveness and safety from emerging datasets. Mandate transparency and format around study reporting for regulatory submission to document datasets, protocol, tools, and version used to promote reproducibility. Emphasize the need for outcome measures from novel data sources (e.g., m‐health devices to be reflective of a defined clinical benefit).

Full details of the recommendations can be found in the phase I summary report of the HMA‐EMA Joint Big Data Taskforce.46

AI, artificial intelligence; EU, European Union; GCP, good clinical practice; HMA‐EMA, Heads of Agencies and the European Medicines Agency; IT, information technology.