Abstract
Background
BaMaRa allows the secure collection and deidentified centralization of medical data from all patients followed-up in a rare disease expert network in France, based on a minimum data set (SDM-MR). The present article describes BaMaRa information system implementation and development across the whole national territory as well as data access requests through BNDMR, the data warehouse which centralizes all BaMaRa data, during the 2015–2020 period.
Materials and Methods
SDM-MR is made up of 60 interoperable items and is routinely collected through BaMaRa in rare disease centers as part of care and discharged into BNDMR after deidentification and data reconciliation. Data access is regulated by a scientific committee.
Results
In total, 668 002 affected patients had an SDM-MR recorded in BNDMR by the end of 2020 with a mean value of 3.4 activities per patients. Data access was provided for 66 projects.
Conclusion
The BaMaRa-BNDMR infrastructure provides an administrative and epidemiological resources for rare diseases in France.
Keywords: rare diseases, information systems, health information interoperability, data warehousing, data anonymization
INTRODUCTION
France is one of the few countries that benefits from a centralized health insurance database covering exhaustively and permanently the entire population. The data recorded in routine include information on the use of care, hospitalizations, disability, services social, and professional activity.1
Meanwhile, this database is not useable to study care pathways of most patients suffering from rare diseases. Indeed, diagnoses are coded in routine using international classification of diseases version 10, which poorly describes rare diseases.2 In the absence of dedicated codes, they are often identified with nonspecific or aggregative codes, making it impossible to analyze rare diseases using this database.
To promote epidemiological surveillance for rare diseases, 3 successive national plans dedicated to rare diseases focused on the development of appropriate nomenclature for rare diseases and data collection of patient care data from the expert rare disease network, organized within 23 rare disease major medical specialties.3
For this data collection, this expert rare disease network first turned to local solutions to record their activity on rare disease patients. This is how the CEMARA database was created in 2007 at the Necker Enfants Malades Hospital of the Assistance Publique—Hôpitaux de Paris (AP-HP).4,5 Built around a single and restricted set of data, shared by all the participating centers,6 CEMARA made it possible to collect the activities carried out by the care team over time and to specifically code rare diseases thanks to the use of the Orphanet nomenclature.7 Additional collections were possible to answer targeted research questions.8 CEMARA has thus been deployed in nearly half of the rare disease centers. Based on this experience, a unique epidemiological information system was implemented, called BaMaRa, in line with international recommendations,9 which allows the secure collection and deidentified centralization of medical data from all patients followed-up in rare disease expert network.10,11 It aims to constitute a homogeneous collection of data on the basis of a minimum data set (SDM-MR) to document the care and state of health of patients with rare diseases in French expert centers, and better assess the effect of national plans. BaMaRa has been developed and is maintained by the National Rare disease databank (BNDMR) infrastructure team.
Developing minimal data set to support rare disease registry has received interest during the last years,10,12 but real-life implementation at a large scale in an information system has been scarcely described.13 The present article describes BaMaRa information system implementation and development across the whole national territory and data requests during the 2015–2020 period. It provides a unique evaluation of a minimal data set for rare disease implementation and use over a national territory.
MATERIALS AND METHODS
The BaMaRa-BNDMR infrastructure to support SDM-MR collection
The SDM-MR Clinical Document Architecture
The SDM-MR, defined at the end of a consultation process bringing together national experts and a review of the literature, breaks down by chapter as follows: consent (regulatory), patient identification, administrative information, family information (if applicable), vital status, care course, care activity (ie, what has been undertaken during the patient’s venue, for instance, medical consultation), history of the disease, diagnostic, confirmation of diagnosis, treatment, anté and neonatal course (if applicable), research (if applicable). It is made up of 60 items and both semantic and syntaxic interoperability thanks to the use of medical terminologies aligned with existing international standards (Orphanet, Human Phenotype Ontology, HUGO Gene Nomenclature Committee, etc.) and to the use of HL7 Clinical Document Architecture (CDA) (Figure 1). It is now a benchmark at European level and has served in particular as a model for the establishment of a minimum set at European level.14 To ensure semantic interoperability, each concept is transcoded into a couple (code, terminology) using the most suited standard terminology to define the concept.
Figure 1.
Health information systems interoperability framework (“Cadre d’Interopérabilité des systems d’information en santé,” CI-SIS) used for SDM-MR.
Collecting data through the BaMaRa application
The SDM-MR is routinely collected in rare disease centers as part of care in BaMaRa local databases (one local database per institution). BaMaRa, deployed in Software as a Service, allows the collection of the SDM-MR either by direct entry via the web interface (“BaMaRa-web”), or thanks to interoperability with a dedicated form in the computerized patient record of the hospital (“BaMaRa-connected”).
BNDMR data warehouse
All local databases, both from BaMaRa-web and BaMaRa-connected mode, but also historical CEMARA data are gathered into BNDMR, authorized by CNIL decision DR-2019-113, after a deidentification and a reconciliation process.
Deidentification is ensured through the use of the national rare disease identifier (IdMR). This 20-digit code is permanent and not reversible. It makes it possible to identify the same patient followed-up in different centers while preserving his privacy at the national level. It is based on the following data (exact match): surname, first name, date of birth, and sex. IdMR generation is integrated into local BaMaRa. Therefore, only deidentified data are transmitted to BNMDR-CDW.
Data reconciliation is made in 2 steps, the first one being between BaMaRa-connected and BaMaRa-web. This step raises 2 challenges: the data block identification in order to identify if the data are already in the database, and the merge policy definition. A data block is defined as a group of related data identified by a key (eg, the diagnosis block is the group of all the diagnosis fields, identified by the diagnosis technical id). The policy choice identifies 2 cases: unique and repeatable data blocks. A unique data block is present only one time in the database. In this case, priority is given to the BaMaRa-connected (eg, patient administrative data block, identified by the patient identifier). A repeatable data block can be instantiated several times as patients can have several diagnoses. The data reconciliation process is illustrated in Figure 2.
Figure 2.
Data reconciliation process.
BNDMR data access procedures
Data access through BaMaRa
An investigator willing to perform research projects involving only retrospective data from patients cared by his or her expert center can access data. According to French regulation, this type of projects does not require any assessment by an ethical committee. A user interface enables to provide such access in BaMaRa application.
Data access through BNDMR
Data access to BNDMR can only be provided for requests having public interest and relevant for the strategic interest with regard to the missions of the BNDMR and to national plans. Data access is regulated by a scientific committee.
The scientific committee is in charge of the control of the use of data carried out within the BNDMR. It validates the operating mechanisms for data access, defines and validates data access levels for the different stakeholders (rare disease expert centers, industry, public actors …), ensures that data access is compliant to French regulation and advises about communication opportunities. This scientific committee is headed by a researcher appointed by the French Health Ministry. Its members are appointed for a 3-year period.
This committee implemented the following principles for data access in order to guarantee both data access ease and compliance with the regulation. Data access requests are first assessed by the BNDMR team about feasibility and objectives. For demands that are assessed to be feasible and in accordance with BNDMR objectives (public interest and relevant for the strategic interest with regard to the missions of the BNDMR and to national plans), data access is provided without any additional step for rare disease centers and after assessment of the scientific committee otherwise. This procedure is not specific to French counterparts. Therefore, foreign institutions may ask data access using the same procedure.
RESULTS
BaMaRa deployment
As of June 17, 2021, BaMaRa is deployed in 2125 expert rare disease centers out of 2223 (95%) in 81 different healthcare institutions.
In total, 668 002 affected patients had an SDM-MR recorded in BNDMR-DW by the end of 2020 for a total of 2 494 582 activities, that is, a mean value of 3.7 activities per patients (Table 1). Number of activities recorded culminated in 2019. Forty-nine percent of patients had only one recorded activity. Through records linkage, we identified 80 787 patients (12.1%) having activities in at least 2 different centers.
Table 1.
Demographics, care course, activity, and diagnosis SDM-MR items and their completion rate
| Item name | Item type | Example | Item completion rate (%) |
|---|---|---|---|
| Consent | Demographics | Yes | 100 |
| Birth date | Demographics | 11/11/2011 | 100 |
| Sex | Demographics | Male | 100 |
| Birth place | Demographics | Paris | 91 |
| Living place | Demographics | Marseille | 95 |
| Vital status | Demographics | Alive | 72 |
| Referring physician type | Care course | Primary care physician | 90 |
| Inclusion date in rare disease center | Care course | 11/2/2015 | 100 |
| Rare disease center physician national identification number | Care course | 8-digits number | 98 |
| Rare disease center identification number | Care course | 6-digits alphanumeric | 100 |
| Activity date | Activity | 12/12/2018 | 100 |
| Activity context | Activity | Outpatient clinic | 99 |
| Activity objective | Activity | Emergency care | 99 |
| Status of the health professional carrying out the activity | Activity | Medical Doctor | 100 |
| Activity rare disease site number | Activity | S0123 | 100 |
| Rare disease (ORPHA) | Diagnosis | ORPHA 881 | 65 |
| Clinical description | Diagnosis | HPO 0002676; ICD10 N30.0 | 39 |
| Age at first sign | Diagnosis | 2 years | 95 |
| Diagnosis status at first visit | Diagnosis | Undetermined | 94 |
| Age at diagnosis | Diagnosis | 4 years | 85 |
| Current diagnosis status | Diagnosis | Confirmed | 97 |
| Diagnostic investigations | Diagnosis | Clinical | 56 |
When focusing on demographics (consent, patient identification, administrative information), care course, activity, and diagnosis SDM-MR chapters, most SDM-MR items have a very low proportion of missing data (Figure 3), but a higher proportion for diagnosis description, particularly for clinical description.
Figure 3.
Cumulative number of activities and patients per year.
BNDMR data access during the 2015–2020 period
Data access was provided for 66 projects (see Supplementary Table S1), among them 48 (72%) requests from rare disease networks, the other coming from public institutions except one project arising from a private company. Nineteen out of the 23 rare disease reference networks requested data access for at least one project. For 53 out of the 66 projects (80%), data access was requested either in 2019 or 2020. For these projects, 61% require at least one item about diagnosis and 65% require at least one item about care description.
DISCUSSION AND CONCLUSION
This paper describes the setting of both an information system and a data warehouse for the collection of a minimal dataset for rare disease deployed over the whole French territory. Now recording care data of about 700 000 patients suffering for rare diseases, it provides a valuable epidemiological resource both for rare disease expert network but also for institutions and private stakeholders.
The BaMaRa-BNDMR initiative is unique in that it is halfway between a medico-administrative database (used by the French Ministry of Health for activity-based funding of rare disease centers) and a registry collecting research data even including a patient-reported outcome (age at first signs). To our knowledge, such an information system has never deployed, as routine care funding and research are performed by different stakeholders. This achievement relates to a long history of data reuse from health national administrative database in France for epidemiological purpose because of its exhaustiveness across national territory.1,15 Indeed, exhaustiveness enables to avoid selection biases that are the scourge of epidemiological studies and will enhance the possibility to use BNDMR for clinical research.16 Of note, activity and diagnosis items have about the same completion rate showing that it is possible to successfully deploy an information system useful both for administrative and research purpose.
Compared to registry, BaMaRa offers the possibility to collect information for all patients cared by expert network including patients having no ascertained diagnosis. This particularity enables to collect data on delay to diagnosis and to have a global picture of wandering history, but also to organize specific registry for undiagnosed patient, such as the undiagnosed disease network,17 based on SDM-MR as core variables, as recommended.14 This is particularly useful to have the global picture of rare disease burden and to precisely analyze care trajectories and improve them.
A complete process was set up to manage duplicates that are another scourge for the reuse of data from the health information system for research. This concerns both identity linkage through and records linkage. It has already been noted that population-level surveillance systems for rare chronic conditions, such as congenital heart disease require sophisticated identity reconciliation methods to prevent bias introduced by duplicate cases18 as failure to resolve duplicate cases between data sources would inflate the relationship between chronic heart disease severity and both morbidity and mortality outcomes by 15% among adults and adolescents in Colorado. Therefore, BaMaRa is a valid source to use for epidemiological purpose because of the high level of quality for duplicates treatment. The same holds for records linkage as it has already been noted a positive correlation between records duplicates and disease severity.19
A limitation of BaMaRa information system is the limited number of clinical items for research. Indeed, clinical description of the disease is limited to descriptors following human phenotype ontology. Therefore, deep biological or radiological description is not possible through BaMaRa data. To overcome this limitation, BaMaRa is developing specialized records as an add-on to SDM-MR. This has already been performed for fetopathology and for patients with no ascertained diagnosis having a neuromuscular rare disease.
As a conclusion, BaMaRa-BNDMR infrastructure for rare disease was successfully implemented in France and provides information for both medical and administrative purpose at a national level.
FUNDING
The BNDMR operational team is subsidized by the French Ministry of Health, as parts of its public interest policy.
AUTHOR CONTRIBUTIONS
A-SJ drafted the paper. A-SJ, CM, and AK had full access to the data used for this paper and take responsibility for the integrity of the data. A-SJ, CM, and AK did the analyses and take responsibility for the accuracy of the data analysis. TP provided all the materials related to BaMaRa-connected. A-SJ, CM, and AS helped on the methodological aspects. Cellule Opérationnelle BNDMR, with expertise in various fields (e-health, medical informatics, web development, semantics and knowledge management, biostatistics, computer networks and databases, and communication), contributed on all IT aspects (from collection of needs to functional and technical specifications to development, deployment, and subsequent data treatments). All authors critically revised the manuscript for important intellectual content and gave final approval for the version to be published.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.
Supplementary Material
ACKNOWLEDGMENTS
The authors are grateful to the members of the French Rare Diseases Expert Network who are collecting the data. They are also grateful to the French Rare Diseases Data Repository (BNDMR) operational team, operating from Assistance Publique—Hôpitaux de Paris (AP-HP).
CONFLICT OF INTEREST STATEMENT
None declared.
DATA AVAILABILITY
The data underlying this article are available in the article and in its online supplementary material.
Contributor Information
Anne-Sophie Jannot, Banque Nationale de Données Maladies Rares, DSI-I&D, APHP, Paris, France; Université de Paris, Paris, France; HeKA team, Centre de Recherche des Cordeliers, Sorbonne Université, Inserm, Université de Paris, Paris, France.
Claude Messiaen, Banque Nationale de Données Maladies Rares, DSI-I&D, APHP, Paris, France.
Ahlem Khatim, Banque Nationale de Données Maladies Rares, DSI-I&D, APHP, Paris, France.
Thibaut Pichon, Banque Nationale de Données Maladies Rares, DSI-I&D, APHP, Paris, France.
Arnaud Sandrin, Banque Nationale de Données Maladies Rares, DSI-I&D, APHP, Paris, France.
References
- 1. Tuppin P, Rudant J, Constantinou P, et al. Value of a national administrative database to guide public decisions: from the système national d’information interrégimes de l’Assurance Maladie (SNIIRAM) to the système national des données de santé (SNDS) in France. Rev Epidemiol Sante Publique 2017; 65 (Suppl 4): S149–67. [DOI] [PubMed] [Google Scholar]
- 2. Aymé S, Bellet B, Rath A. Rare diseases in ICD11: making rare diseases visible in health information systems through appropriate coding. Orphanet J Rare Dis 2015; 10 (1): 35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Péton-Klein D. [Inaugural conference—what stakes face rare diseases in the French health care system]. Med Sci 2014; 30 Spe (1): 5–7. [DOI] [PubMed] [Google Scholar]
- 4. Messiaen C, Le Mignot L, Rath A, et al. CEMARA: a Web dynamic application within a N-tier architecture for rare diseases. Stud Health Technol Inform 2008; 136: 51–6. [PubMed] [Google Scholar]
- 5. Landais P, Messiaen C, Rath A, et al. CEMARA an information system for rare diseases. Stud Health Technol Inform 2010; 160 (Pt 1): 481–5. [PubMed] [Google Scholar]
- 6. Choquet R, Maaroufi M, de Carrara A, Messiaen C, Luigi E, Landais P. A methodology for a minimum data set for rare diseases to support national centers of excellence for healthcare and research. J Am Med Inform Assoc 2015; 22 (1): 76–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S. Representation of rare diseases in health information systems: the orphanet approach to serve a wide range of end users. Hum Mutat 2012; 33 (5): 803–8. [DOI] [PubMed] [Google Scholar]
- 8. Mantoo S, Meurette G, Wyart V, et al. The impact of anorectal malformations on anorectal function and social integration in adulthood: report from a national database. Colorectal Dis 2013; 15 (6): e330–5. [DOI] [PubMed] [Google Scholar]
- 9. Kodra Y, Weinbach J, Posada-de-la-Paz M, et al. Recommendations for improving the quality of rare disease registries. Int J Environ Res Public Health 2018; 15 (8): 1644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Ben Said M, Robel L, Messiaen C, et al. Patient information, consents and privacy protection scheme for an information system dedicated to pervasive developmental disorders. Stud Health Technol Inform 2014; 205: 755–9. [PubMed] [Google Scholar]
- 11. Maaroufi M, Landais P, Messiaen C, Jaulent M-C, Choquet R. Federating patients identities: the case of rare diseases. Orphanet J Rare Dis 2018; 13 (1): 199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Furusawa Y, Yamaguchi I, Yagishita N, et al. National platform for Rare Diseases Data Registry of Japan. Learn Health Syst 2019; 3 (3): e10080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Hilbert JE, Kissel JT, Luebbe EA, et al. If you build a rare disease registry, will they enroll and will they use it? Methods and data from the National Registry of Myotonic Dystrophy (DM) and Facioscapulohumeral Muscular Dystrophy (FSHD). Contemp Clin Trials 2012; 33 (2): 302–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Berger A, Rustemeier A-K, Göbel J, et al. How to design a registry for undiagnosed patients in the framework of rare disease diagnosis: suggestions on software, data set and coding system. Orphanet J Rare Dis 2021; 16 (1): 198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Bezin J, Duong M, Lassalle R, et al. The national healthcare system claims databases in France, SNIIRAM and EGB: powerful tools for pharmacoepidemiology. Pharmacoepidemiol Drug Saf 2017; 26 (8): 954–62. [DOI] [PubMed] [Google Scholar]
- 16. Stubbs A, Uzuner Ö. New approaches to cohort selection. J Am Med Inform Assoc 2019; 26 (11): 1161–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Yates J, Gutiérrez-Sacristán A, Jouhet V, et al. Finding commonalities in rare diseases through the undiagnosed diseases network. J Am Med Inform Assoc 2021; 28 (8): 1694–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Crume TL, Duca LM, Ong T, et al. Population-level surveillance of congenital heart defects among adolescents and adults in Colorado: implications of record linkage. Am Heart J 2020; 226: 75–84. [DOI] [PubMed] [Google Scholar]
- 19. Wang EC-H, Wright A. Characterizing outpatient problem list completeness and duplications in the electronic health record. J Am Med Inform Assoc 2020; 27 (8): 1190–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article are available in the article and in its online supplementary material.



