Towards FAIRification of sensitive and fragmented rare disease patient data: challenges and solutions in European reference network registries

Bruna dos Santos Vieira; César H Bernabé; Shuxin Zhang; Haitham Abaza; Nirupama Benis; Alberto Cámara; Ronald Cornet; Clémence M A Le Cornec; Peter A C ’t Hoen; Franz Schaefer; K Joeri van der Velde; Morris A Swertz; Mark D Wilkinson; Annika Jacobsen; Marco Roos

doi:10.1186/s13023-022-02558-5

. 2022 Dec 14;17:436. doi: 10.1186/s13023-022-02558-5

Towards FAIRification of sensitive and fragmented rare disease patient data: challenges and solutions in European reference network registries

Bruna dos Santos Vieira ^1,^2,^#, César H Bernabé ^3,^#, Shuxin Zhang ^4,^5,^#, Haitham Abaza ⁶, Nirupama Benis ^4,⁵, Alberto Cámara ⁷, Ronald Cornet ^4,⁵, Clémence M A Le Cornec ⁸, Peter A C ’t Hoen ¹, Franz Schaefer ⁸, K Joeri van der Velde ⁹, Morris A Swertz ⁹, Mark D Wilkinson ⁷, Annika Jacobsen ^3,^✉,^#, Marco Roos ^3,^✉,^#

PMCID: PMC9749345 PMID: 36517834

Abstract

Introduction

Rare disease patient data are typically sensitive, present in multiple registries controlled by different custodians, and non-interoperable. Making these data Findable, Accessible, Interoperable, and Reusable (FAIR) for humans and machines at source enables federated discovery and analysis across data custodians. This facilitates accurate diagnosis, optimal clinical management, and personalised treatments. In Europe, twenty-four European Reference Networks (ERNs) work on rare disease registries in different clinical domains. The process and the implementation choices for making data FAIR (‘FAIRification’) differ among ERN registries. For example, registries use different software systems and are subject to different legal regulations. To support the ERNs in making informed decisions and to harmonise FAIRification, the FAIRification steward team was established to work as liaisons between ERNs and researchers from the European Joint Programme on Rare Diseases.

Results

The FAIRification steward team inventoried the FAIRification challenges of the ERN registries and proposed solutions collectively with involved stakeholders to address them. Ninety-eight FAIRification challenges from 24 ERNs’ registries were collected and categorised into “training” (31), “community” (9), “modelling” (12), “implementation” (26), and “legal” (20). After curating and aggregating highly similar challenges, 41 unique FAIRification challenges remained. The two categories with the most challenges were “training” (15) and “implementation” (9), followed by “community” (7), and then “modelling” (5) and “legal” (5). To address all challenges, eleven types of solutions were proposed. Among them, the provision of guidelines and the organisation of training activities resolved the “training” challenges, which ranged from less-technical “coffee-rounds” to technical workshops, from informal FAIR Games to formal hackathons. Obtaining implementation support from technical experts was the solution type for tackling the “implementation” challenges.

Conclusion

This work shows that a dedicated team of FAIR data stewards is an asset for harmonising the various processes of making data FAIR in a large organisation with multiple stakeholders. Additionally, multi-levelled training activities are required to accommodate the diverse needs of the ERNs. Finally, the lessons learned from the experience of the FAIRification steward team described in this paper may help to increase FAIR awareness and provide insights into FAIRification challenges and solutions of rare disease registries.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13023-022-02558-5.

Keywords: FAIR, Stewardship, Rare disease, Patient registry, Data steward

Introduction

Rare diseases (RDs) are defined as life-threatening or chronically debilitating conditions that affect a low percentage of the population. In Europe, diseases are considered “rare” when their prevalence is less than 5 per 10,000 people [1]. Their low prevalence means that RD patient data is scarce and fragmented. Consequently, it is difficult to access sufficient data to support, for instance, research, drug development and improvements in outpatient care. The Orphanet, the National Organisation for Rare Diseases (NORD) [2], and other initiatives around the world have deemed it important to improve collaboration for research [3] and Open Science for RD [4]. Such initiatives make it easier for people with RDs to share their data. In fact, the importance of data sharing is consistently emphasised by RD patients themselves [5]. To help with research on RDs, the European Joint Programme on Rare Diseases (EJP RD) was set up in 2018 [6]. The programme aims to solve the problem of fragmented information and to build a research ecosystem that makes the best use of data and resources, thus benefiting people with RDs. The EJP RD project collaborates directly with the 24 European Reference Networks (ERNs) [7], which involve more than 900 highly specialised healthcare units from more than 130 institutions in 35 countries [6]. Each ERN works on a subset of RDs and maintains registries of varying complexity. Some ERNs have a single centralised registry to which participating healthcare providers submit data, whereas others have registries established in their participating institutes, where each institute collects and maintains its data.

Unfortunately, because each ERN collects unique data, there are wide variations in terms of content, format, and language across their RD registries. This heterogeneity makes it virtually impossible to jointly analyse ERN data, wasting considerable time and effort for data analysts and affecting any large-scale research project aimed at improving RD patient care. For instance, counts of patients with similar symptoms, treatments for similar symptoms across different geographic regions, or time-to-diagnosis cannot be produced by a simple query across all registries. A patient representative searching for “genomes pertaining to a rare disease profile not yet classified as such” or a researcher analysing “observed phenotypes of citizens with the same genetic profile” with the aim to “identify correlations with regional factors” are examples of more complex queries that can be executed on multiple resources across institutes and countries, the premises of which, however, is to make data Findable, Accessible, Interoperable, and Reusable (FAIR). It is, therefore, crucial to improve the Findability, Accessibility, Interoperability and Reusability (FAIRness or FAIR ‘maturity’) of the data collected in the RD registries of the 24 ERNs, for both humans and machines, as stated in the FAIR Guiding Principles [8]. When data are FAIR, they can be queried in an unambiguous and federated way, globally (if appropriate reuse conditions are met) without leaving its premises [9, 10]. In addition, an ecosystem based on FAIR principles adapts its functionality to its sources, because each source is self-explanatory.

Various methods can be applied for making data FAIR (also referred to as ‘FAIRification’) among the 24 ERNs, which contributes to diverging FAIRification methods and implementation choices throughout the network of ERNs. These differences are due to 1) different requirements and objectives (e.g., an initial focus on legal aspects, or a focus on internal queriability), 2) different software systems and tools (e.g., an Electronic Data Capture (EDC) system, the lack of license for a specific ontology), 3) different disease domains (e.g., rare types of cancer, bone diseases), and 4) different jurisdictions (e.g., different laws between centres/countries). Applying different FAIRification methods theoretically still leads to interoperable solutions by definition, but overall, the process is not efficient for a community. Thus, harmonisation of methods and definitions and sharing of best practices would be beneficial to maximise the efficiency and benefit of FAIRification for all stakeholders.

Data can be made FAIR retrospectively, often long after they were collected, which may require extensive efforts to understand the meaning of the data [11–13]. Data can also be made FAIR when they are being collected, where the FAIRification steps are embedded in the data collection tool [14]. The latter was implemented for a VASCERN ERN registry, where data are made FAIR automatically and in real-time upon collection [15]. This FAIRification workflow can be reused by other ERNs across data collection platforms. Nevertheless, there is a need to guide the ERNs in achieving higher efficiency by aligning their implementation choices regarding tools (e.g., EDC software), standards (e.g., data representation syntaxes, ontologies), and legal decisions (e.g., sending data to a central registry in a different country versus several hospitals with their own FAIR databases, informed consent forms, data access policies, data processing and sharing agreements).

To harmonise FAIRification across ERN RD patient registries, a FAIRification steward team was established to act as liaisons between the ERNs and FAIR experts. These liaisons, supported by the EJP RD, provide a unique opportunity to investigate the ERNs’ understanding and application of the FAIR principles to enable the use of data across international borders in the RD field. This work aims to 1) identify the challenges in FAIRifying RD registries and 2) support European-wide harmonised FAIRification by proposing solutions in the RD field.

Methods

Organisation of the FAIRification steward team

The EJP RD FAIRification steward team was established on July 10th, 2020, to support and ensure harmonised FAIRification of ERN RD patient registries. The team is composed of six FAIR data stewards with different scientific backgrounds (biomedical science, software development, hospital management, public health, engineering) and education levels (BSc, MSc, and PhD). As illustrated in Fig. 1, the FAIR data stewards facilitate the communication between ERNs and EJP RD FAIR experts. Each FAIR data steward collects FAIRification challenges from the ERNs they are assigned to. Then, the team curates these challenges and submits them to the FAIR experts, who provide the knowledge that is needed for proposing solutions. The team conveys the challenges requiring customised and ongoing support for a single ERN to the relevant experts and requests specific solutions.

Fig. 1 — FAIRification steward team, EJP RD FAIR (principles, standards, and tools) experts, and European Reference Networks (ERNs) in a three-party interaction map. The FAIRification steward team works as liaisons between ERNs and EJP RD FAIR experts, collecting FAIRification challenges from ERNs, curating these challenges, providing them to experts, and returning consolidated knowledge from the experts to ERNs as proposed solutions. For single ERN requests, the team creates Expert-ERN communication channels (dashed line). The ERN team includes a project manager (or equivalent), a local data steward, and a developer (or software provider). The set of proposed solutions comprises workshops, where standards or tools are presented by experts; hackathons, where developers can try different tools themselves in a hands-on fashion; experience exchange between ERNs; and suggestions of existing implementations, tools, and resources

Each ERN formed a core FAIRification team, including a project manager or equivalent (e.g., data manager, registry manager), a clinical domain expert, a local data steward, and a developer. The last could be replaced by the hired EDC company's programming support. Each FAIR data steward supports four ERNs and is the backup for four other ERNs. The communication channels between each ERN and their FAIR data steward were established in a first introduction meeting, and thereafter maintained in follow up meetings on demand.

Identification of the FAIRification challenges

We identified the FAIRification challenges of the ERN RD patient registries in two main steps: collection of challenges and curation of challenges. The second step consists of three sub-steps: categorisation, rephrasing, and merging of challenges. These are further detailed in this subsection.

Firstly, the FAIR data stewards collected the challenges that ERNs had with making their RD patient registries FAIR based on an initial set of 77 tools and standards identified by EJP RD FAIR experts. The implementation status of each standard or tool was identified for each ERN (“Implemented”, “Plans to Implement”, “Need Expert Help”, “Implementing Assisted by Expert” or “Non-Applicable”), as exemplified in Table 1. Note that additional tools and standards could be added where applicable, as disclaimed in the document. Questions and implementation details specific to a tool or standard were recorded for each ERN and used as the main input for the FAIRification challenges. These data were collected by the FAIR data stewards while meeting with ERNs and stored in a persistent and traceable document. To preserve privacy, access to this data is restricted to the associated EJP RD FAIR experts and FAIR data stewards. The FAIR data stewards continued to communicate with ERNs regularly to provide feedback and follow-up on their questions, which could lead to additional FAIRification challenges.

Table 1.

An excerpt of the document used to collect the implementation status of each tool and standard for each ERN

Function	Tool/standard name	ERN registry implementation status
Data model	CDE semantic model	Implemented
Set of data elements	Common data elements JRC	Implemented
Genes Ontology	HGNC	Plans to Implement
Genes Ontology	HUGO	Non-Applicable
Variant Ontology	HGVS	Plans to Implement
Phenotype Ontology	HPO	Needs expert help (see methods)
International Classification of Diseases	ICD-10	Non-Applicable
International Classification of Diseases	ICD-11	Implemented
Minimum Information About Biobank Data Sharing	MIABIS	Implementing assisted by expert

Open in a new tab

The first column describes functions related to tools and standards which are listed in the second column. The last column tracks the implementation status of each tool or standard (“Implemented”, “Plans to Implement”, “Need Expert Help”, “Implementing Assisted by Expert” or “Non-Applicable”). The references to the tools can be found in the template of the Additional file 1The first column describes functions related to tools and standards which are listed in the second column. The last column tracks the implementation status of each tool or standard (“Implemented”, “Plans to Implement”, “Need Expert Help”, “Implementing Assisted by Expert” or “Non-Applicable”). The references to the tools can be found in the template of the Additional file 1

Secondly, all FAIRification challenges collected in the previous step by December 31st, 2020, were categorised, rephrased, and merged. All FAIRification challenges were categorised by: (1) “training”, specifying the need for training on a specific technology or concept; (2) “community”, requiring peer experience exchange; (3) “modelling”, relating to (meta)data models or conceptual modelling activities; (4) “implementation”, requiring programming expertise, such as the implementation of data exchange interfaces between systems; and (5) “legal”, describing questions about data sharing and reuse agreements, informed consent, or any related services (e.g., patient informed consent form). These categories were defined by the FAIRification steward team based on the commonalities identified among the challenges. The categories and their definitions are summarised in Table 2. With this categorisation, we standardised the presentation of common solutions to avoid the need for repeated referrals to experts.

Table 2.

List of categories and their definitions

Category	Definition
Training	Challenges related to inquiries for more information on a specific tool, standard, or a general concept
Community	Challenges involving activities of peers in the same community to achieve reuse and prevent duplicated effort
Modelling	Challenges involving the conceptualisation of data into data elements and bindings of standardised vocabularies to these data elements
Implementation	Challenges involving implementation of a specific tool or standard
Legal	Challenges related to inquiries about data sharing and reuse agreements, informed consent, or implementation of related services

Open in a new tab

Five categories were created to organise the FAIRification challenges of RD patient registries. The categories reflect the nature of the challenges: the need for training, to learn from others, information about modelling, implementation, or legal aspects

The FAIRification challenges after categorisation were rephrased and merged based on their content and commonalities. For instance, the two example challenges “We need hands-on help to implement the Common Data Element (CDE) [16] in REDCap (Research Electronic Data Capture) [17]” and “How can the CDE Semantic Model be implemented in Marvin XClinical [18]?” could be merged to one curated challenge “How to implement the CDE model [19] in my EDC system?”.

All processes, i.e., categorisation, rephrasing, and merging, were at least reviewed by two independent reviewers. The FAIRification challenges that result from this processing are referred to as curated FAIRification challenges. The remaining inconsistencies were resolved in discussions with the entire team and, upon need, with EJP RD FAIR experts.

Proposing solutions to the FAIRification challenges

The FAIR data stewards defined solutions to the curated FAIRification challenges in collaboration with different stakeholders. The five stakeholder groups who contributed to the development of these solutions were: (1) ERN representatives, (2) EJP RD FAIR (principles, standards, and/or tools) experts, (3) EJP RD coordinators, (4) Joint Research Centre, and (5) software developers and providers. To maximise efficiency, we defined solutions capable of addressing the highest number of challenges simultaneously. For the challenges that could be solved using readily available single solutions, we directly contacted the relevant stakeholders. Further, for the challenges that required novel solutions to be developed, the recombination of existing solutions, a long-term effort, or the participation of multiple parties, we arranged various types of activities that allowed for brainstorming for all stakeholders including ERNs.

Results

Here we present the work by the EJP RD FAIRification steward team to support the FAIRification of ERN RD patient registries. This includes the list of identified FAIRification challenges and proposed solutions to the ERNs. The solutions were reused or developed with input from multiple internal and external stakeholders to ensure convergence.

Overview of FAIRification Challenges

Ninety-eight FAIRification challenges were collected from all 24 ERNs. Their respective counts for each category before “original”) and after curation are shown in Table 3. The most common category was “training” (31) while the least common was “community” (9). The “implementation” category contained 26 challenges, “legal” contained 20, and finally“modelling” contained 12. More details on all original and curated challenges can be found in the [see Additional file 2].

Table 3.

The number of FAIRification challenges for each category (training, community, modelling, implementation and legal) defined in our approach

FAIRification challenges	Categories
FAIRification challenges	Train.	Comm.	Model.	Impl.	Legal
Original (98)	31	9	12	26	20
Curated (41)	15	7	5	9	5

Open in a new tab

The second and third rows show the number of challenges before and after curation, respectively

After curation, the total number of challenges was reduced to 41. The “implementation” category had the biggest reduction (from 26 to 9). The “training” category was reduced from 31 to 15, “legal” from 20 to 5, “modelling” from 12 to 5, and “community” from 9 to 7. The “training” and “implementation” categories remained the most and second most common categories, respectively. On the other hand, “modelling” and “legal” were the categories with the lowest number of challenges after curation.

The fifteen curated “training” challenges were either related to a tool or standard, for example, CDEs, CDE semantic model [19], mapping languages, FAIR Data Point, registration of registries through the European Rare Disease Registry Infrastructure (ERDRI) [20], informed consent, pseudonymisation, and query (see Table 4). “More information on semantic data model”, and “More information on the FAIR Data Point (FDP)” are examples of “training” challenges.