Abstract
Purpose:
Given limited information available on real-world data (RWD) sources with pediatric populations, this study describes features of globally available RWD sources for pediatric pharmacoepidemiologic research.
Methods:
An online questionnaire about pediatric RWD sources and their attributes and capabilities was completed by members and affiliates of the International Society for Pharmacoepidemiology and representatives of nominated databases. All responses were verified by database representatives and summarized.
Results:
Of 93 RWD sources identified, 55 unique pediatric RWD sources were verified, including data from Europe (47%), United States (38%), multiregion (7%), Asia-Pacific (5%), and South America (2%). Most databases had nationwide coverage (82%), contained electronic health/medical records (47%) and/or administrative claims data (42%) and were linkable to other databases (65%). Most (71%) had limited outside access (e.g., by approval or through local collaborators); only 10 (18%) databases were publicly available. Six databases (11%) reported having >20 million pediatric observations. Most (91%) included children of all ages (birth until 18th birthday) and contained outpatient medication data (93%), while half (49%) contained inpatient medication data. Many databases captured vaccine information for children (71%), and one-third had regularly updated data on pediatric height (31%) and weight (33%). Other pediatric data attributes captured include diagnoses and comorbidities (89%), lab results (58%), vital signs (55%), devices (55%), imaging results (42%), narrative patient histories (35%), and genetic/biomarker data (22%).
Conclusions:
This study provides an overview with key details about diverse databases that allow researchers to identify fit-for-purpose RWD sources suitable for pediatric pharmacoepidemiologic research.
Keywords: global databases, pediatric research, pharmacoepidemiology, real-world data sources, real-world evidence
Plain Language Summary (PLS):
Real-world data (RWD) sources play an important role in the healthcare decision-making process. Pharmacoepidemiologists use RWD sources to study the utilization, safety, and effectiveness of medical products, conduct studies to support regulatory approvals, and other health technology assessment work. In this study, we have described 55 unique pediatric RWD sources with verified attributes available for pediatric pharmacoepidemiologic, epidemiologic, and health services research globally. The majority of verified databases profiled in this study are located in Europe (47%) or North America (38%), have nationwide coverage (82%), and contain electronic medical (health) records data (47%) and/or administrative claims data (42%). Most profiled databases include children of all ages (birth until 18 years of age). Future studies should profile additional RWD sources useful for pediatric pharmacoepidemiologic research, particularly data sources outside of Europe and the United States.
INTRODUCTION
Pediatric pharmacoepidemiologic research has often been informed by a wide range of real-world data (RWD) sources to study disorders and treatments in children 0 to <18 years of age. Administrative claims databases, electronic health records (EHR)/electronic medical records (EMR), pharmacy data, product and disease registries, and health surveys are among the common RWD sources used to expand and diversify the pool of data utilized in this field of research.1-6 While RWD often cannot replace “gold standard” randomized clinical trials, real-world insights can inform clinical product development, support the planning and conduct of clinical trials, and consequently reduce their cost or duration.4 The number of clinical trials conducted in the pediatric population and approved pediatric indications have increased over the last two decades following legislation and regulations. However, the absolute numbers of children participating in clinical trials are low, and a majority of recent pediatric clinical trials are of limited size, performed at single sites, and without reporting of trial results.7,8 Therefore, there is still a need for research in children outside of clinical trials to enhance the available data and generate additional, generalizable evidence in this population.7
Real-world evidence (RWE) generated during routine clinical practice is increasingly critical in decision-making processes related to medicinal products, including regulatory approval, patient access, health technology assessment, safety evaluation, and post-approval lifecycle management.1,9,10 However, the use of RWE is currently not as widely used in pediatrics, and there is a need to further leverage administrative and EHR databases to study effectiveness and safety of medications in pediatric populations.2,11
As RWD becomes increasingly important in epidemiology and pharmacoepidemiologic research,11,12 an overview of RWD sources available to researchers is needed. In 2015, McMahon et al. cataloged 20 unique databases in North America that captured post-marketing safety data in children including adverse events to pharmacotherapy.13 However, this article did not include databases outside of North America or address databases that could be used for research questions beyond drug safety. Moreover, there is limited literature providing details about high-quality RWD sources for pediatric patients including their geographic coverage and how outside researchers can gain access to these valuable data sources. In addition, interest in and use of RWD has since increased, and new databases have arisen since 2015. Therefore, we aimed to provide an overview of globally available RWD sources, including data attributes and capabilities, that can be used for pediatric pharmacoepidemiologic and other research, with a focus on medications and vaccines.
METHODS
Project Team
An interdisciplinary study team was formed within the International Society for Pharmacoepidemiology (ISPE) Pediatrics Special Interest Group (SIG). Our study team consisted of pediatricians, pharmacoepidemiologists, data scientists, and pharmacists who worked in the public sector, government, private sector, or academia. The study team was based in Germany, Israel, Italy, Switzerland, and the United States (US).
Pediatric RWD Questionnaire
The study team created an online questionnaire through Google Forms, which asked 46 questions about a pediatric RWD source and its attributes. Respondents could only complete the questionnaire about a database if it contained data on patients <18 years of age.
Database attributes collected included data source type, geographic and temporal coverage, data accessibility, data administration, available data linkages, size and age groups of the pediatric population covered, diagnoses and comorbidities, medication data, vaccine data, medical device data, laboratory and imaging results, genetic/biomarker data, narrative patient histories, and other patient-level data including height, weight, and vital signs.
Within the questionnaire, we defined “inpatient” as hospitalized patients, i.e., patients admitted to the hospital floor or intensive care unit for at least one night. We defined “outpatient” as anyone not hospitalized, including patients seen in ambulatory clinics, hospital-based clinics, outpatient surgical centers, and emergency departments.
Questionnaire Administration
The study team collected database attributes through two rounds of inquiries via the online questionnaire (Figure 1). In the first round, which began in September 2020, the questionnaire was sent to the ISPE RWE task force subgroup on RWD sources as well as to four SIGs within ISPE: Databases, Medications in Pregnancy and Lactation, Pediatrics, and Vaccines. Members of these groups completed questionnaires for pediatric databases they have used and were allowed to complete questionnaires for more than one database. Completed questionnaire responses were then shared with appropriate database representatives (i.e., vendor employees or professionally affiliated researchers) to verify questionnaire responses. For the second round of inquiry, which began in February 2021, members of the same groups plus the study team were invited to nominate additional relevant databases. For each nominated database, study team members requested that appropriate database representatives complete the online questionnaire (May 2021-May 2022). Database representatives were given the opportunity to review, verify, and correct the information collected after initial questionnaire submission and again prior to manuscript submission.
Figure 1.
Questionnaire Responses and Verification Outcomes (N=93)
In some instances, multi-linkage systems such as national healthcare registries were collated into a single questionnaire response. For example, Nordic countries maintain various healthcare registries such as birth registries and prescription registries, which are frequently linked and studied in combination.
Statistical Analysis
Detailed attributes of verified RWD sources were combined into a single dataset for descriptive statistical analysis using pivot tables in Microsoft Excel. When representatives followed-up with additional details beyond the questionnaire’s original specifications, these additional data were cleaned and put into a structured format (Appendix I).
RESULTS
The first round of questionnaires yielded 36 unique responses while the second round yielded 57 additional unique responses, for a total of 93 nominated RWD sources (Figure 1). For 38 databases (41%), questionnaires were either not completed or could not be verified and were thus omitted from the study (Appendix II). Databases were most commonly omitted because an appropriate database representative could not be reached to verify the data (n=26) or instances where the database representative pointed to a publicly available database information without completing the questionnaire (n=10).
The final dataset included 55 unique pediatric RWD sources with verified attributes (Table 1, Appendix I). Tables 2-4 characterize selected attributes of these databases within each of the three following regional categories: Europe (Table 2), US (Table 3), and a composite group inclusive of multiregional, Asian-Pacific, and South American RWD sources (Table 4).
Table 1.
Overview of Database Characteristics (N=55)
| Category | Database feature | Responses | n | %* |
|---|---|---|---|---|
| Geographical coverage | Continent covered | Europe | 26 | 47.3% |
| North America (US) | 21 | 38.2% | ||
| Multiregion | 4 | 7.3% | ||
| Asia-Pacific | 3 | 5.5% | ||
| South America (Brazil) | 1 | 1.8% | ||
| Geographical area(s) captured (check all that apply) | National / Country | 45 | 81.8% | |
| State / Province | 17 | 30.9% | ||
| Region | 25 | 45.5% | ||
| City | 7 | 12.7% | ||
| Institution Specific | 7 | 12.7% | ||
| Other | 10 | 18.2% | ||
| Data type | Type(s) of data utilized (check all that apply) | Electronic Health Record (EHR) / Electronic Medical Record (EMR) | 26 | 47.3% |
| Administrative claims data | 23 | 41.8% | ||
| Disease registry | 7 | 12.7% | ||
| Survey data | 6 | 10.9% | ||
| Other | 21 | 38.2% | ||
| Start of data available | Calendar year | Start pre-2000 | 20 | 36.4% |
| Start 2000-2009 | 22 | 40.0% | ||
| Start 2010+ | 11 | 20.0% | ||
| Start not specified | 2 | 3.6% | ||
| Size | Number of children (or events) captured (approximations) | <500,000 | 10 | 18.2% |
| 500,000 to <1 million | 3 | 5.5% | ||
| 1 to <5 million | 14 | 25.5% | ||
| 5 to 20 million | 12 | 21.8% | ||
| ≥20 million | 6 | 10.9% | ||
| Variable | 4 | 7.3% | ||
| Not sure, unknown | 6 | 10.9% | ||
| Age groups | Patient age captured | 0 to <2 years | 1 | 1.8% |
| 0 to <12 years | 1 | 1.8% | ||
| 0 to <18 years | 3 | 5.5% | ||
| 0 to ≥18 years | 47 | 85.5% | ||
| 2 to ≥18 years | 3 | 5.5% | ||
| Access to database | Outside access for researchers (check all that apply) | Data are publicly available | 10 | 18.2% |
| Data access is limited based on approval or license from vendor | 19 | 34.5% | ||
| Data access is limited based on approval or license from vendor (for non-commercial investigators only) | 5 | 9.1% | ||
| Data access is limited only to local investigators, who can collaborate with outside investigators | 15 | 27.3% | ||
| Data access is limited only to local investigators | 3 | 5.5% | ||
| Other | 18 | 32.7% | ||
| No response | 0 | 0.0% | ||
| Administrator of data to researchers (check all that apply) | Government Entity | 21 | 38.2% | |
| Private Sector | 15 | 27.3% | ||
| Academic Institution | 11 | 20.0% | ||
| Non-Profit Organization | 9 | 16.4% | ||
| Other | 9 | 16.4% | ||
| Linkage | Ability to link with other databases for research | Yes | 36 | 65.5% |
| No | 17 | 30.9% | ||
| I'm not sure | 2 | 3.6% | ||
| Medication data for children | Medication data: Inpatient | Yes | 27 | 49.1% |
| No | 27 | 49.1% | ||
| I'm not sure | 1 | 1.8% | ||
| Inpatient: Types of medications captured (check all that apply) | Prescribed Medications | 18 | 32.7% | |
| Administered Medications | 15 | 27.3% | ||
| Other | 8 | 14.5% | ||
| Not captured | 27 | 49.1% | ||
| Medication data: Outpatient | Yes | 51 | 92.7% | |
| No | 4 | 7.3% | ||
| Outpatient: Types of medications captured (check all that apply) | Dispensed Medications | 39 | 70.9% | |
| Prescribed Medications | 35 | 63.6% | ||
| Over-the-Counter Medications | 7 | 12.7% | ||
| Other | 4 | 7.3% | ||
| Not captured | 3 | 5.5% | ||
| Vaccine data for children | Vaccine data | Yes | 39 | 70.9% |
| No | 14 | 25.5% | ||
| I'm not sure | 2 | 3.6% | ||
| Inpatient vaccine data | Yes | 14 | 25.5% | |
| No | 34 | 61.8% | ||
| I'm not sure | 7 | 12.7% | ||
| Outpatient vaccine data | Yes | 34 | 61.8% | |
| No | 17 | 30.9% | ||
| I'm not sure | 4 | 7.3% | ||
| Vaccine manufacturer data | Always | 6 | 10.9% | |
| Sometimes | 24 | 43.6% | ||
| Never | 19 | 34.5% | ||
| I'm not sure | 6 | 10.9% | ||
| Vaccines administered to patients through walk-in clinics or via other vendors | Always | 5 | 9.1% | |
| Sometimes | 20 | 36.4% | ||
| Never | 21 | 38.2% | ||
| I'm not sure | 9 | 16.4% | ||
| Device data for children | Device data | Yes | 30 | 54.5% |
| No | 22 | 40.0% | ||
| I’m not sure | 3 | 5.5% | ||
| Inpatient device data (e.g., ventilators, CR monitors) | Yes | 25 | 45.5% | |
| No | 27 | 49.1% | ||
| I'm not sure | 3 | 5.5% | ||
| Outpatient device data (e.g., thermometers, Fit Bit devices) | Yes | 17 | 30.9% | |
| No | 31 | 56.4% | ||
| I'm not sure | 7 | 12.7% | ||
| Additional patient details for children | Height recorded | Yes, on a regular basis with frequent updates | 17 | 30.9% |
| Yes, but rarely updated | 15 | 27.3% | ||
| No | 22 | 40.0% | ||
| I'm not sure | 1 | 1.8% | ||
| Weight recorded | Yes, on a regular basis with frequent updates | 18 | 32.7% | |
| Yes, but rarely updated | 16 | 29.1% | ||
| No | 19 | 34.5% | ||
| I'm not sure | 2 | 3.6% | ||
| Vital signs captured | Inpatient and outpatient | 14 | 25.5% | |
| Outpatient only | 12 | 21.8% | ||
| Inpatient only | 2 | 3.6% | ||
| Yes, inpatient/outpatient not specified | 2 | 3.6% | ||
| No | 24 | 43.6% | ||
| I'm not sure | 1 | 1.8% | ||
| Lab results | Inpatient and outpatient | 18 | 32.7% | |
| Outpatient only | 12 | 21.8% | ||
| Inpatient only | 1 | 1.8% | ||
| Yes, inpatient/outpatient not specified | 1 | 1.8% | ||
| No | 22 | 40.0% | ||
| I'm not sure | 1 | 1.8% | ||
| Imaging | Inpatient and outpatient | 13 | 23.6% | |
| Outpatient only | 6 | 10.9% | ||
| Inpatient only | 1 | 1.8% | ||
| Yes, inpatient/outpatient not specified | 3 | 5.5% | ||
| No | 29 | 52.7% | ||
| I'm not sure | 3 | 5.5% | ||
| Narrative Patient History | Inpatient and outpatient | 9 | 16.4% | |
| Outpatient only | 7 | 12.7% | ||
| Inpatient only | 0 | 0.0% | ||
| Yes, inpatient/outpatient not specified | 3 | 5.5% | ||
| No | 35 | 63.6% | ||
| I'm not sure | 1 | 1.8% | ||
| Diagnosis or comorbidity data | Yes | 49 | 89.1% | |
| No | 6 | 10.9% | ||
| Genetic/biomarker data | Yes | 12 | 21.8% | |
| No | 40 | 72.7% | ||
| I'm not sure | 3 | 5.5% |
Percentages for some variables may not sum to 100% as some questions allowed responses for more than one answer such as “check all that apply.”
Table 2.
Selected Attributes by Database – Europe (N=26)
| Database | Data Type(s) Included | Data Elements Captured a | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Claims | EHR/ EMR |
Other | Height or Weight |
Genetic/ Biomarker |
Outpatient Medications |
Inpatient Medications |
Vaccines | Diagnoses or Comorbidities |
Laboratory Test Results |
|
| Base de datos para la Investigación Farmacoepidemiológica en el Ámbito Público (BIFAP) - Spain | X | X | X | X | X | X | X | |||
| Clinical Practice Research Datalink (CPRD) – United Kingdom | X | X | X | X | X | X | ||||
| Danish National Healthcare Registries | X | X | X | X | X | |||||
| electronic Data Research and Innovation Service (eDRIS) - Scotland | X | X | X | X | X | X | X | |||
| German Pharmacoepidemiological Research Database (GePaRD) | X | X | X | X | ||||||
| Hospital Treatment Insights (HTI) - England | X | X | X | X | X | |||||
| InGef Research Database - Germany | X | X | X | X | ||||||
| Integrated Primary Care Information (IPCI) - Netherlands | X | X | X | X | X | |||||
| Lazio Drug Claims Registry - Italy | X | X | ||||||||
| Lombardy Health Database - Italy | X | X | X | |||||||
| National Health Data System (SNDS) - France | X | X | X | X | X | X | X | |||
| NorPreSS- Iceland | X | X | X | X | ||||||
| Norwegian National Healthcare Registries | X | X | X | X | ||||||
| Pedianet - Italy | X | X | X | X | X | X | ||||
| PrescriptiOns Médicaments Mères Enfants (POMME) - France | X | X | X | X | X | X | ||||
| SANItà a centralità dell’Assistito e della Risposta Prescrittiva (SANIARP) - Italy | X | X | X | X | X | X | ||||
| Secure Anonymised Information Linkage (SAIL) - Wales | X | X | X | X | X | X | X | X | ||
| SIDIAP - Spain | X | X | X | X | X | X | ||||
| Statistics Database of Estonian Health Insurance Fund | X | X | X | X | ||||||
| Swedish National Healthcare Registries | X | X | X | X | ||||||
| The Finnish Prescription Register (FPR) | X | X | ||||||||
| The Health Improvement Network (THIN) – Multiple Countries b | X | X | X | X | X | X | X | X | ||
| The PHARMO Database Network & PHARMO Perinatal Research Network (PPRN) - Netherlands | X | X | X | X | X | X | X | X | X | |
| The Prescription Centre in Kanta Database - Finland | X | X | ||||||||
| Valencia Health System Integrated Database - Spain | X | X | X | X | X | X | X | |||
| WIG2 Benchmark Database - Germany | X | X | X | X | X | |||||
X = captured or partially captured, <blank> = not captured or not sure whether element is captured or not. Some data elements marked as captured may be partially captured or updated on an infrequent basis. Additional details can be found in Appendix I.
The Health Improvement Network (THIN) data comes from Belgium, France, Germany, Italy, Romania, Spain, and the UK. THIN is a wholly owned subsidiary of Cegedim SA who own the proprietary rights in THIN data.
Table 4.
Selected Attributes by Database – Multiregion, Asia-Pacific, and South America (N=8)
| Database | Data Type(s) Included | Data Elements Captured a | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Claims | EHR/ EMR |
Other | Height or Weight |
Genetic/ Biomarker |
Outpatient Medications |
Inpatient Medications |
Vaccines | Diagnoses or Comorbidities |
Laboratory Test Results |
|
| Multiregion | ||||||||||
| FDA Adverse Event Reporting System (FAERS) | X | X | X | X | X | X | ||||
| Go-PGx: Genomic & Outcomes Databank for Pharmacogenomic & Implementation Studies | X | X | X | X | X | X | X | X | ||
| TriNetX | X | X | X | X | X | X | X | X | ||
| Vigibase | X | X | X | X | X | |||||
| Asia-Pacific | ||||||||||
| Chang Gung Research Database (CGRD) - Taiwan | X | X | X | X | X | X | X | X | X | |
| EBM Provider - Japan | X | X | X | X | X | X | ||||
| Maccabi Health Care Services - Israel | X | X | X | X | X | X | ||||
| South America | ||||||||||
| VigiMed - Brazil | X | X | X | X | X | X | X | |||
X = captured or partially captured, <blank> = not captured or not sure whether element is captured or not. Some data elements marked as captured may be partially captured or updated on an infrequent basis. Additional details can be found in Appendix I.
Table 3.
Selected Attributes by Database – United States (N=21)
| Database | Data Type(s) Included | Data Elements Captured a | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Claims | EHR/ EMR |
Other | Height or Weight |
Genetic/ Biomarker |
Outpatient Medications |
Inpatient Medications |
Vaccines | Diagnoses or Comorbidities |
Laboratory Test Results |
|
| America's Poison Centers National Poison Data System (NPDS) | X | X | X | X | ||||||
| Cerner Real-World Data | X | X | X | X | X | X | X | |||
| Collaborative Effectiveness Research through Collaborative Electronic Reporting (CER2) | X | X | X | X | X | X | ||||
| Explorys | X | X | X | X | X | X | X | X | ||
| Carelon Research Healthcare Integrated Research Database | X | X | X | X | X | X | X | X | X | X |
| IQVIA Ambulatory EMR-US (AEMR-US) | X | X | X | X | X | X | X | |||
| IQVIA Longitudinal Access and Adjudicated Data (LAAD) | X | X | X | X | X | X | X | |||
| IQVIA PharMetrics® Plus | X | X | X | X | X | X | X | |||
| MarketScan Commercial & Encounters Database | X | X | X | X | X | X | ||||
| MarketScan MultiState Medicaid Database | X | X | X | X | X | |||||
| Medical Product Safety Network (MedSun) | X | X | X | X | X | |||||
| National Ambulatory Medical Care Survey (NAMCS) | X | X | X | X | ||||||
| National Electronic Injury Surveillance System (NEISS) | X | X | ||||||||
| National Health and Nutrition Examination Survey (NHANES) | X | X | X | X | X | X | X | |||
| National Hospital Ambulatory Medical Care Survey (NHAMCS) | X | X | X | X | ||||||
| Optum Electronic Health Records Research Database | X | X | X | X | X | X | X | X | ||
| Optum Claims Research Database | X | X | X | X | ||||||
| Pediatric Health Information System (PHIS) | X | X | X | X | X | |||||
| Pediatrix Clinical Data Warehouse | X | X | X | X | X | X | ||||
| PEDSnet | X | X | X | X | X | X | X | X | ||
| Sentinel Distributed Database b | X | X | X | X | X | X | X | X | ||
X = captured or partially captured, <blank> = not captured or not sure whether element is captured or not. Some data elements marked as captured may be partially captured or updated on an infrequent basis. Additional details can be found in Appendix I.
The Sentinel Distributed Database includes multiple databases within the United States including databases operated by Aetna, Carelon, Humana, Kaiser Permanente, Optum, and several other data partners that are not included in this table.
Almost every database described is still in operation and continues to capture data as of December 2022. Only one database in the study is no longer actively collecting data, NorPreSS - Iceland.14
To assist researchers, we have included two specific peer-reviewed publications for each of the 55 databases (Appendix III). These chosen publications aim to provide illustrative instances of how these databases have been utilized in pediatric research. However, it’s important to note that these examples are not exhaustive and may not fully encompass the the databases’ potential uses.
Geographical and Temporal Coverage
Among the 55 verified RWD sources, most were located in Europe (n=26, 47%), followed by North America (all US) (n=21, 38%), multiregion (n=4, 7%), Asia-Pacific (n=3, 5%), and South America (n=1, 2%) (Table 1). Most databases reported having nationwide coverage (82%), while 45% reported capturing regional data and 31% by state or province (categories not mutually exclusive). Within Europe, databases were based in the following countries: Italy (n=4), United Kingdom (UK) or specific UK countries (n=4), Germany (n=3), Spain (n=3), France (n=2), the Netherlands (n=2), Finland (n=2), Denmark (n=1), Iceland (n=1), Norway (n=1), Estonia (n=1), and Sweden (n=1). One database, The Health Improvement Network (THIN), included data from multiple European countries.
Dates of available data are shown in Table 1 and Appendix I. Certain databases date back to the 1960s such as the FDA Adverse Event Reporting System (FAERS), Norwegian National Healthcare Registries, and Vigibase. Others date back to the 1970s such as the Danish National Healthcare Registries and the National Ambulatory Medical Care Survey (NAMCS). While some date back to the 1980s including America's Poison Centers National Poison Data System (NPDS) and Clinical Practice Research Datalink (CPRD).
Database Type
The database types most frequently reported were EHR/EMR (47%) and administrative claims (42%). Other verified databases were disease registries (13%) or included survey data (11%). Within the 26 verified European databases, seven (27%) were administrative claims only, six (23%) were EHR/EMR only, and 13 (50%) were either databases inclusive of different types of data (e.g., EHR/EMR with administrative claims) or other (non-claims or non-EHR/EMR) database types. Among the 21 verified databases from the United States, six (29%) were administrative claims only, seven (33%) were EHR/EMR only, two (10%) were inclusive of both administrative claims and EHR/EMR, and six (29%) represented other types of data such as surveys.
Database Size
The numbers of children or pediatric observations ranged considerably across databases. For example, 18% contained fewer than 500,000 children/observations while 11% contained more than 20 million children/observations.
Age Groups
The majority of databases (91%) included children of all ages, which was defined as capturing patients 0 to <18 years old or 0 to ≥18 years old. Other databases held patient populations restricted to 0 to <2 years of age (n=1), 0 to <12 years of age (n=1), or >2 years of age (n=3).
Access to Database and Linkage
Only 10 databases (18%) make their data publicly available. In contrast, 71% of databases limit access to outside investigators (e.g., by approval, license, or through collaboration with local investigators). Three databases stated that their data are only available to local investigators with no access to outside researchers. Two-thirds (66%) of the 55 verified databases can be linked with other databases for research purposes.
Medication Data
Inpatient medication data are covered by nearly half (49%) of the 55 databases, with 18 capturing prescribed inpatient medications and 15 capturing administered inpatient medications. Almost all of the 55 databases (93%) contained information on outpatient medication, with 35 capturing prescribed outpatient medications, 39 capturing dispensed outpatient medications, and seven capturing over-the-counter medications. Among European databases, 10 (38%) captured data for both inpatient and outpatient medications while 16 (62%) captured data for outpatient medications only. Regarding US databases, nine (43%) captured data for both inpatient and outpatient medications, nine (43%) captured data for outpatient medications only, and two (10%) captured inpatient medications only. One database, the National Electronic Injury Surveillance System (NEISS), did not systematically collect data on inpatient or outpatient medications but did have data on poisonings. All eight multiregional, Asian-Pacific, or South American databases included medication data, seven including both inpatient and outpatient medications (Table 4).
Vaccine and Device Data
Vaccine information is included in 39 (71%) of the 55 databases. However, the vaccine manufacturer data are always included in only 11% while an additional 44% have an incomplete but partial inclusion of such data. Vaccine data were more commonly reported in outpatient settings (34 databases) than inpatient settings (14 databases). When assessed separately by region, 16 US databases (76%), 18 European databases (69%), and five other (multiregional/Asian-Pacific/South American) databases (63%) included some vaccine data. Device data are contained in 55% of verified databases: 45% from inpatient settings and 31% from outpatient settings.
Additional Patient Details
One-third of the 55 databases regularly record pediatric height (31%) and/or weight (33%), but over half of databases contain some pediatric height and weight data. Over half of verified databases contain vital sign and lab result data, while over one-third contain imaging results (42%) and narrative historical data (35%). Most of the 55 databases record diagnosis and comorbidity data (89%), but only one-quarter (22%) report having information on genetics and biomarkers.
DISCUSSION
To our knowledge, this is the first study to review globally available RWD sources that can be used for pediatric pharmacoepidemiologic research studies. The majority of verified databases profiled in this study are located in Europe and North America, have nationwide coverage, and contain EHR/EMR data and/or administrative claims data. Most profiled databases include children of all ages (0 to <18 years of age), of which most contain data from people of all ages and a handful (e.g., Collaborative Effectiveness Research through Collaborative Electronic Reporting (CER2), Pedianet, Pediatrix Clinical Data Warehouse) are limited to the pediatric population. Two-thirds of verified data sources can be linked to other datasets. Nearly three-quarters of verified databases permit limited access to outside investigators (e.g., via license or collaboration), although certain databases are publicly available while a handful of others are limited to local investigators only.
Prior efforts have cataloged and described RWD sources useful for pharmacoepidemiologic research in pediatric populations. A study from the Task Force in Europe for Drug Development for the Young (TEDDY) Network of Excellence profiled 17 European databases useful for pediatric drug safety and utilization research, containing information on over 9 million children.15 A follow-up study by the same group described the characteristics of 15 European databases useful for research specifically on the safety of attention deficit hyperactivity disorder (ADHD) medications.16 In another paper, we characterized nine databases from North America that may be used for post-marketing safety surveillance in children.13 The current study, which provides updated profiles of many of the same databases in these prior publications, is more extensive in comparison, including information about additional databases across multiple continents as well as information (e.g., available growth and laboratory data, access, linkage) useful for various pharmacoepidemiologic, other epidemiologic, and health services research studies. Other compilations of databases useful in pediatric research include a survey of 34 databases from the Global Research in Paediatrics Network of Excellence (GRiP)17 and ENCePP Resources Database,18 an online catalog of databases that is searchable for settings with children.
Collectively, the proportions of databases with specific types of data (e.g., outpatient medications, inpatient medications, vaccines, diagnoses, lab results, etc.) are fairly similar across regions. Four in five verified databases reported having national coverage, although few databases comprehensively captured data from an entire nation’s pediatric population. Many databases also captured more granular geographic information, in many cases at the level of regions, states, or provinces, and less commonly at the level of specific cities or institutions.
All North American databases profiled in this paper come from the US. Unlike European countries with national healthcare systems and population-representative data sources (e.g., Danish Healthcare Registries, Clinical Practice Research Datalink, Pedianet), the US lacks a national healthcare system and, correspondingly, nationally representative RWD sources. Nonetheless, the US also has large data sources collectively covering substantial portions of the US pediatric population including various databases maintained by private companies (e.g., Cerner, Carelon Research, IQVIA, Optum, Truven) as well as the Sentinel Distributed Database hosted by the US Food and Drug Administration. Sentinel is the US’s largest patient database and among the largest pediatric RWD sources profiled, combining data from multiple entities to reach a total patient sample of 365.1 million, including 43.8 million (12%) pediatric patients.19 Some databases described in our study, such as those administered by Carelon and Optum, contribute data to Sentinel.20 A couple verified databases outside the US also include relatively large pediatric populations, including France’s National Health Data System (~20 million children) and the multiregional TriNetX database (~23 million children).
Almost all verified databases (93%) contain information on outpatient medication, and half (49%) have inpatient medication data, thus presenting a great opportunity to study the utilization and effects of drugs in pediatric populations in routine clinical practice. Two-thirds of verified databases also contain vaccine data, albeit with variable details about manufacturers and settings of administration. Additionally, over half of verified databases contain data on devices, which are particularly understudied in pediatric populations. Of note, certain databases profiled in this paper (e.g., NEISS) do not systematically capture drug information but can be used to study toxicity and poisonings.21,22 Some profiled databases may be valuable to researchers and other users of pediatric data seeking specialized data. For example, genetic and biomarker data are captured by only one-quarter of verified databases. Only one-third of verified databases routinely capture height and weight data, which can be useful in characterizing pediatric medication exposures (e.g., weight- or body surface area-based dosage), confounders (e.g., obesity, underweight), and effect modification by body size.
This study has several notable strengths. We have described in a single place many diverse RWD sources available for pediatric pharmacoepidemiologic and health services research globally. The inclusion of databases with pediatric data from around the world is an advance over other compendia, which have frequently focused on pediatric databases in specific regions or continents. Many databases described in our study are not widely known and could be valuable resources for pediatric research, even if no pediatric studies have yet been published. For certain other databases that are well known, their specific attributes and capabilities may be not readily identified using publicly available or previously published resources. Our study provides ample detail (including sample publications, Appendix III) to help researchers understand which RWD sources are suitable and fit-for-purpose for specific projects. Additionally, information about the databases profiled has been verified by vendor representatives and database holders, increasing confidence in the accuracy of the information provided.
Certain limitations should be also noted. Many nominated databases were not included in the study, either because details were not provided (sometimes because information was publicly available) or details provided were not verified by database representatives. However, these resources may represent other potentially valuable settings for pediatric pharmacoepidemiologic research (Appendix II, Unverified Databases). Moreover, we believe that there is a possibility that several other databases, which were not recommended by survey respondents, might also have met the inclusion criteria of our study. Database attributes (e.g., pediatric population size) are also fluid and changing regularly; our study presents a snapshot of what was currently available when questionnaires were completed and verified. Finally, the questionnaire did not capture all available attributes for each database, including detailed information on devices or biologics.
CONCLUSION
This study provides an overview of attributes and capabilities of RWD sources from around the world that can be used for pharmacoepidemiologic and other research in pediatric populations. While the list of databases included is not exhaustive, we have profiled a diverse array of institutional, regional, national, and global databases with distinct populations and types of data, often covering the entire age spectrum but in some cases limited to infants or children. We present information about geographic and temporal coverage, ages covered, available types of data and data elements, timeliness of data, data access, and examples of research using these resources. These details should allow researchers from various sectors to identify fit-for-purpose RWD sources useful for pediatric investigations. Future efforts should aim to maintain up-to-date, more comprehensive, and searchable information about these databases, while also profiling additional RWD sources that prove valuable for pediatric pharmacoepidemiologic research.
Supplementary Material
APPENDIX I: Verified Questionnaire Responses (N=55)- see separate excel spreadsheet Appendix I: Verified Questionnaire Responses
APPENDIX II: Unverified Databases (N=38)- see separate file
APPENDIX III: Selected Publications Using Verified Databases (N=55)- see separate file
APPENDIX IV: Copy of the Survey- see separate file
Key Points:
This study provides a review of 55 globally available real-world data (RWD) sources that can be used for pediatric pharmacoepidemiologic, epidemiologic, and health service research.
Most verified databases are located in Europe (47%) or the United States (38%) and have nationwide coverage (82%); the most frequent database types are electronic health/medical records (47%) and administrative claims (42%), and almost all databases include children of all ages (91%), usually in data sources also containing adult data.
Inpatient medication data are captured by 49% of verified databases, and almost all (93%) verified databases contain information on outpatient medications.
A majority (71%) of verified databases capture vaccine data, while over half (55%) contain device data.
ACKNOWLEDGEMENTS:
The views and recommendations expressed in this article are the personal views of the authors who are members of the International Society for Pharmacoepidemiology (ISPE) Pediatrics Special Interest Group and do not necessarily reflect those of the ISPE. This article is not an ISPE document. These personal views of the authors may also not be understood or quoted as being affiliated with institutions or organizations associated with individual authors. The authors thank all ISPE members and database representatives who provided information about the databases.
Funding statement:
This work was supported by grant funding from the National Institutes of Health (R01AR074436, R01HD109335, R61HD105619, R33HD105619, UL1TR003017).
Footnotes
Conflict of Interest:
GTW, AWM, OS, CB declare no conflict of interest. DB is an employee of Takeda. MB is a full-time employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA, and holds stock in Merck & Co., Inc., Rahway, NJ, USA. NM and MS are employees of EpidStrategies, a Division of ToxStrategies and have received various research grants to conduct pharmacoepidemiology studies using the databases described in this study. CF is a member of academic Spin-off "INSPIRE SRL" INnovative Solutions for medical Prediction and big data Integration in REal world setting" SRL which has received funding for conducting observational studies from various pharmaceutical companies. SK is an employee of Teva Pharmaceutical Industries Ltd. JWS is currently an employee at Pfizer Inc. DBH receives salary support and research funding from the Childhood Arthritis and Rheumatology Research Alliance.
Ethics Statement: The authors state that no ethical approval was needed as this research did not involve human subjects nor was any human subject data captured, extracted, or analyzed.
Data Availability Statement:
The dataset used and/or analyzed during the current study are available within this article and/or figures, tables, and appendices.
REFERENCES
- 1.Radawski CA, Hammad TA, Colilla S, et al. The utility of real-world evidence for benefit-risk assessment, communication, and evaluation of pharmaceuticals: Case studies. Pharmacoepidemiol Drug Saf. 2020;29(12):1532–1539. doi: 10.1002/pds.5167 [DOI] [PubMed] [Google Scholar]
- 2.Lasky T, Carleton B, Horton DB, et al. Real-World Evidence to Assess Medication Safety or Effectiveness in Children: Systematic Review. Drugs - real world outcomes. 2020;7(2):97–107. doi: 10.1007/s40801-020-00182-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Katkade VB, Sanders KN, Zou KH. Real world data: an opportunity to supplement existing evidence for the use of long-established medicines in health care decision making. J Multidiscip Healthc. 2018;11:295–304. doi: 10.2147/JMDH.S160029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dagenais S, Russo L, Madsen A, Webster J, Becnel L. Use of Real-World Evidence to Drive Drug Development Strategy and Inform Clinical Trial Design. Clin Pharmacol Ther. 2022;111(1):77–89. doi: 10.1002/cpt.2480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Flynn R, Plueschke K, Quinten C, et al. Marketing Authorization Applications Made to the European Medicines Agency in 2018-2019: What was the Contribution of Real-World Evidence? Clin Pharmacol Ther. 2022;111(1):90–97. doi: 10.1002/cpt.2461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Frieden TR. Evidence for Health Decision Making - Beyond Randomized, Controlled Trials. N Engl J Med. 2017;377(5):465–475. doi: 10.1056/NEJMra1614394 [DOI] [PubMed] [Google Scholar]
- 7.Ittenbach RF, Corsmo JJ, Kissling AD, Strauss AW. How many minors are participating in clinical research today? An estimate and important lessons learned. J Clin Transl Sci. 2021;5(1):e179. doi: 10.1017/cts.2021.844 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhong Y, Zhang X, Zhou L, Li L, Zhang T. Updated analysis of pediatric clinical studies registered in ClinicalTrials.gov, 2008-2019. BMC Pediatr. 2021;21(1):212. doi: 10.1186/s12887-021-02658-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Franklin JM, Glynn RJ, Martin D, Schneeweiss S. Evaluating the Use of Nonrandomized Real-World Data Analyses for Regulatory Decision Making. Clin Pharmacol Ther. 2019;105(4):867–877. doi: 10.1002/cpt.1351 [DOI] [PubMed] [Google Scholar]
- 10.Pacurariu A, Plueschke K, McGettigan P, et al. Electronic healthcare databases in Europe: descriptive analysis of characteristics and potential for use in medicines regulation. BMJ open. 2018;8(9):e023090. doi: 10.1136/bmjopen-2018-023090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Horton DB, Blum MD, Burcu M. Real-World Evidence for Assessing Treatment Effectiveness and Safety in Pediatric Populations. J Pediatr. 2021;238:312–316. doi: 10.1016/j.jpeds.2021.06.062 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.McMahon AW, Pan GD. Assessing Drug Safety in Children — The Role of Real-World Data. N Engl J Med. 2018;378(23):2155–2157. doi: 10.1056/nejmp1802197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.McMahon AW, Wharton GT, Bonnel R, et al. Pediatric post-marketing safety systems in North America: assessment of the current status. Pharmacoepidemiol Drug Saf. 2015;24(8):785–792. doi: 10.1002/pds.3813 [DOI] [PubMed] [Google Scholar]
- 14.NordForsk. Nordic Pregnancy drug Safety Studies - NorPreSS. 2023. Available from: https://www.nordforsk.org/projects/nordic-pregnancy-drug-safety-studies-norpress. Accessed May 26, 2023.
- 15.Neubert A, Sturkenboom MC, Murray ML, et al. Databases for pediatric medicine research in Europe--assessment and critical appraisal. Pharmacoepidemiol Drug Saf. 2008;17(12):1155–1167. doi: 10.1002/pds.1661 [DOI] [PubMed] [Google Scholar]
- 16.Murray ML, Insuk S, Banaschewski T, et al. An inventory of European data sources for the long-term safety evaluation of methylphenidate. Eur Child Adolesc Psychiatry. 2013;22(10):605–618. doi: 10.1007/s00787-013-0386-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ferrajolo C. Drug Safety in Children: Focus on hepatic concerns. Erasmus University Rotterdam. 2014. Available from: hdl.handle.net/1765/77131. Accessed May 26, 2023. [Google Scholar]
- 18.European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP). ENCePP Resources Database. 2023. Available from: https://www.encepp.eu/encepp/search.htm Accessed May 28, 2023.
- 19.Sentinel. Homepage ∣ Sentinel Initiative. 2023. Available from: https://www.sentinelinitiative.org/. Accessed May 26, 2023.
- 20.Sentinel. Who Is Involved ∣ Sentinel Initiative. 2023. Available from: https://www.sentinelinitiative.org/about/who-involved. Accessed May 26, 2023.
- 21.Budnitz DS, Lovegrove MC, Sapiano MR, et al. Notes from the Field: Pediatric Emergency Department Visits for Buprenorphine/Naloxone Ingestion - United States, 2008-2015. MMWR Morb Mortal Wkly Rep. 2016;65(41):1148–1149. doi: 10.15585/mmwr.mm6541a5 [DOI] [PubMed] [Google Scholar]
- 22.Francis M, Spiller HA, Badeti J, et al. Suspected suicides and nonfatal suicide attempts involving antidepressants reported to United States poison control centers, 2000-2020. Clin Toxicol (Phila). 2022;60(7):818–826. doi: 10.1080/15563650.2022.2041202 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
APPENDIX I: Verified Questionnaire Responses (N=55)- see separate excel spreadsheet Appendix I: Verified Questionnaire Responses
APPENDIX II: Unverified Databases (N=38)- see separate file
APPENDIX III: Selected Publications Using Verified Databases (N=55)- see separate file
APPENDIX IV: Copy of the Survey- see separate file
Data Availability Statement
The dataset used and/or analyzed during the current study are available within this article and/or figures, tables, and appendices.

