Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Mar 12;39(3):285–286. doi: 10.1111/ppe.70014

Big Data Are Only as Good as the People and the Processes That Create Them: The EUROlinkCAT Success Story

Babak Khoshnood 1,
PMCID: PMC11997232  PMID: 40075557

1.

It would be short‐sighted and archaic to deny that, with the increasing availability of big data and advancement in artificial intelligence, particularly machine learning and large language models, we have entered a new era of public health, medical research and scientific discovery. However, the current enthusiasm and, at times, even a certain degree of hype for these developments can also result in unintended negative consequences. This can happen if we are not careful during the planning and execution of big data construction, mainly when using and analysing administrative databases, whether by humans or machines.

The paper by Loane et al. [1] about the lessons learned from the EUROlinkCAT study is an excellent example of putting such concerns in a positive perspective. It underscores the added value of undertaking an ambitious linkage study that spans several different data sources in multiple countries. It also illustrates that one cannot underestimate the knowledge, experience, training and efforts required for creating and analysing high‐quality, standardised linked big data that allow addressing multiple questions and objectives. The paper is a good example of what the economists call a ‘positive externality’ of the project, as the experiences of those who led the project and subsequently analysed the data have resulted in recommendations that benefit us all.

EUROlinkCAT was funded by the European Union's Horizon 2020 Research and Innovation programme to assess mortality and morbidity outcomes in children with congenital anomalies. It was a population‐based cohort study of almost 100,000 children born between 1995 and 2014 with congenital anomalies. Seventeen population‐based European registries who are members of the European Surveillance of Congenital Anomalies (EUROCAT with the Central Registry at the Joint Research Centre in Italy, https://eu‐rd‐platform.jrc.ec.europa.eu/eurocat_en), including Belgium, Croatia, Denmark, Finland, France, Germany, Italy, Malta, Netherlands, Norway, Portugal, Spain, Ukraine and the United Kingdom, participated in the EUROlinkCAT study.

The EUROCAT registries use multiple data sources for case ascertainment and verification of the diagnoses with codes and data standardised according to EUROCAT guidelines. For the EUROlinkCAT study, routinely available data, including birth and death records, prescription and hospital discharge data, were harmonised to common data models. Central analysis scripts then produced aggregate tables for the analyses. Not all registries could provide the entire dataset, including hospital discharge and prescription data. Nevertheless, all 17 registries linked their standardised congenital anomalies data to national or regional mortality records. To date, 35 papers have been published on the basis of this data, which have partly informed the recommendations issued in the article by Loane et al.

The most important recommendation of Loane et al. [1] concerns terminations of pregnancy for foetal anomaly (TOPFA). These currently account for approximately 20% of all cases, and in certain registries, including ours in Paris, TOPFA accounts for almost one‐third of all cases [2]. Hence, surveillance of the prevalence and assessments of the outcomes of congenital anomalies are incomplete unless TOPFA are included [3]. Loane et al. recommend that all TOPFA cases be recorded in the mother's healthcare records. Moreover, they note that the current WHO ICD‐10 system is inadequate for recording TOPFA following prenatal diagnosis of a congenital anomaly, and they suggest ways of addressing this problem.

Another critical issue is that many national data systems currently lack a permanent, unique identification number assigned at birth that would enable accurate linkage across multiple healthcare and other databases. Inaccurate and heterogeneous coding can be a significant challenge and barrier to using administrative databases to address questions related to congenital anomalies. This is particularly problematic in cases such as congenital heart defects [4, 5], which involve complex diagnoses and numerous codes. For example, the preferred coding system for congenital heart defects among paediatric cardiologists is the extended version of the International Pediatric Congenital Cardiac Code, which includes over 10,000 individual codes [6]. Another critical example is hypospadias, where coding has long been recognised as a major issue for effective surveillance of this anomaly [7]. To address these issues or to mitigate their impact, the authors recommend using extended versions of the ICD‐10 for coding rare anomalies or adapting other coding systems better suited for characterising particularly complex anomalies.

Finally, national and European policies that prevent the release of results based on a small number of children due to confidentiality concerns severely hinder the visibility of rare anomalies, the study of their causes, and their impact on individual patients and public health. The authors are correct in advocating for adopting policies that allow the release of small numbers to named, trusted researchers with disclosure agreements, thereby enabling more effective surveillance and evaluation studies on rare anomalies, especially in smaller registries or countries.

Administrative and other routinely available healthcare data certainly offer advantages in terms of time and costs. For example, the Système National des Données de Santé (SNDS) in France has provided valuable data that have been published in a variety of subjects [8, 9, 10]. These include post‐marketing surveillance studies of medication use and adverse effects, healthcare costs and clinical care pathways.

Nonetheless, the EUROlinkCAT study and the paper by Loane et al. show that significant barriers remain in using administrative databases for surveillance or research purposes in congenital anomalies. Although improvements are possible and will come over time, there is a need for continued funding and operation of population‐based registries of congenital anomalies that can provide consistent, high‐quality, standardised data over the long term. This is the only feasible means to safeguard effective surveillance of congenital anomalies, assess teratogens, including medications and environmental factors, and evaluate preventive health policies and the impact of changes in medical and surgical care in this field. Otherwise, it would be unrealistic to think that stakeholders and policymakers will have access to the necessary information for devising and evaluating health policies related to congenital anomalies, many of which are classified as Rare Diseases by the European Union.

In these disconcerting times of rather indiscriminate public spending cuts, it is essential to remember that you get what you pay for—and indeed, there is no free lunch.

Author Contributions

The author takes full responsibility for this article.

Disclosure

The author has nothing to report.

Conflicts of Interest

The author declares no conflicts of interest.

Funding: The author received no specific funding for this work.

Data Availability Statement

The author has nothing to report.

References

  • 1. Loane M., Morris J. K., and Garne E., “Recommendations for Improving Surveillance of Congenital Anomalies in Europe Using Healthcare Databases,” Paediatric and Perinatal Epidemiology (2025), 10.1111/ppe.13173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Monier I., Hachem S., Goffinet F., Martinez‐Marin A., Khoshnood B., and Lelong N., “Population‐Based Surveillance of Congenital Anomalies Over 40 Years (1981‐2020): Results From the Paris Registry of Congenital Malformations (remaPAR),” Journal of Gynecology Obstetrics and Human Reproduction 53 (2024): 102780. [DOI] [PubMed] [Google Scholar]
  • 3. Khoshnood B., “Selection Bias in Studies of Birth Defects Among Livebirths: Much Ado About Nothing?,” Paediatric and Perinatal Epidemiology 34 (2020): 665–667. [DOI] [PubMed] [Google Scholar]
  • 4. Dolk H., Loane M., and Garne E., “Congenital Heart Defects in Europe: Prevalence and Perinatal Mortality, 2000 to 2005,” Circulation 123 (2011): 841–849. [DOI] [PubMed] [Google Scholar]
  • 5. Khoshnood B., Lelong N., Houyel L., et al., “Prevalence, Timing of Diagnosis and Mortality of Newborns With Congenital Heart Defects: A Population‐Based Study,” Heart 98 (2012): 1667–1673. [DOI] [PubMed] [Google Scholar]
  • 6. Houyel L., Khoshnood B., Anderson R. H., et al., “Population‐Based Evaluation of a Suggested Anatomic and Clinical Classification of Congenital Heart Defects Based on the International Paediatric and Congenital Cardiac Code,” Orphanet Journal of Rare Diseases 6 (2011): 64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Dolk H., Vrijheid M., Scott J. E., et al., “Toward the Effective Surveillance of Hypospadias,” EnvironHealth Perspect 112, no. 3 (2004): 398–402, 10.1289/ehp.6398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Zureik M., Cuenot F., Weill A., and Dray‐Spira R., “Contribution of Real‐Life Studies in France During the COVID‐19 Pandemic and for the National Pharmaco‐Epidemiological Surveillance of COVID‐19 Vaccines,” Thérapie 78 (2023): 553–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Lassalle M., Zureik M., and Dray‐Spira R., “Proton Pump Inhibitor Use and Risk of Serious Infections in Young Children,” JAMA Pediatrics 177 (2023): 1028–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Swital M., Drouin J., Miranda S., Bakchine S., Botton J., and Dray‐Spira R., “Use of Multiple Sclerosis Disease‐Modifying Therapies During Pregnancy in France: Nationwide Study Between 2010 and 2021,” Multiple sclerosis (Houndmills, Basingstoke, England) 30 (2024): 227–237, 10.1177/13524585231223395. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The author has nothing to report.


Articles from Paediatric and Perinatal Epidemiology are provided here courtesy of Wiley

RESOURCES