Skip to main content
. 2022 Jan 24;7:22. [Version 1] doi: 10.12688/wellcomeopenres.17403.1

Table 8. Database information.

ID Name Description Terms of Use Terms and conditions for the access
CPRD Clinical Practice
Research
Datalink
The Clinical Practice Research Datalink (CPRD) is
a governmental, not-for-profit research service,
jointly funded by the NHS National Institute for
Health Research (NIHR) and the Medicines and
Healthcare products Regulatory Agency (MHRA), a
part of the Department of Health, United Kingdom
(UK). CPRD consists of data collected from UK
primary care for all ages. This includes conditions,
observations, measurements, and procedures
that the general practitioner is made aware of in
additional to any prescriptions as prescribed by the
general practitioner. In addition to primary care,
there are also linked secondary care records for a
small number of people. The major data elements
contained within this database are outpatient
prescriptions given by the general practitioner
(coded with Multilex codes) and outpatient clinical,
referral, immunization or test events that the
general practitioner knows about (coded in Read or
ICD10 or LOINC codes). The database also contains
the patients’ year of births and any date of deaths.
1) Please allow for 2 weeks lead time for all
publications using these results to go through
internal review process. 2) The results are considered
fit-for-use and were generated for this specific
protocol. Derivations from the intent of this protocol
are not validated by our institution.3) Our institution
expects all authors to comply with all applicable
personal data protection rules (such as the European
Data Protection Regulation 2016/679, of April 27,
2016). 4) Our institution reserves the right to request
to omit our results from a drafted publication if the
findings could inflict reputational or institutional
harm.
https://www.cprd.com/
CU-AMC
HDC
U of Colorado
Anschuz
Medical
Campus Health
Data Compass
(CU-AMC HDC)
Health Data Compass (HDC) is a multi-institutional
data warehouse. HDC contains inpatient and
outpatient electronic medical data including
patient, encounter, diagnosis, procedures,
medications, laboratory results from two electronic
medical record systems (UCHealth and Children's
Hospital of Colorado), state-level all-payers
claims data, and the Colorado death registry.
Acknowledgement statement: Supported by the
Health Data Compass Data Warehouse project
(healthdatacompass.org).
1) Please allow for 2 weeks lead time for all
publications using these results to go through
internal review process. 2) When using our results,
you must always use this specific name when
referring to our database. No other labels should
be used in presenting our results. 3) The results are
considered fit-for-use and were generated for this
specific protocol. Derivations from the intent of this
protocol are not validated by our institution. 4) Our
institution reserves the right to request to omit our
results from a drafted publication if the findings
could inflict reputational or institutional harm.
https://www.healthdatacompass.org/
CUIMC Columbia
University
Irving Medical
Center
The clinical data warehouse of NewYork-
Presbyterian Hospital/Columbia University Irving
Medical Center, New York, NY, based on its current
and previous electronic health record systems, with
data spanning over 30 years and including over 6
million patients
Our institution reserves the right to request to omit
our results from a drafted publication if the findings
could inflict reputational or institutional harm. The
results are specific to a study and should not be
reused in other studies without review from our
institution. For consistency, the Columbia database
should be referred to as CUIMC.
https://www.cuimc.columbia.edu/
about-us/explore-cuimc/contact-cuimc
gh13@columbia.edu
HealthVerity HealthVerity This HealthVerity derived data set contains de-
identified patient information with an antibody
and/or diagnostic test for COVID-19 linked to all
available Medical Claims and Pharmacy Data from
select private data providers participating in the
HealthVerity marketplace.
1) Please allow for 2 weeks lead time for all
publications using these results to go through
internal review process. 2) The results are considered
fit-for-use and were generated for this specific
protocol. Derivations from the intent of this protocol
are not validated by our institution.3) Our institution
expects all authors to comply with all applicable
personal data protection rules (such as the European
Data Protection Regulation 2016/679, of April 27,
2016). 4) Our institution reserves the right to request
to omit our results from a drafted publication if the
findings could inflict reputational or institutional
harm.
https://healthverity.com/license-
healthcare-data-healthverity-
marketplace/
HIRA Health
Insurance
Review &
Assessment
Service
National claim data from a single insurance service
from South Korea. It contains the observational
medical records (including both inpatient and
outpatient) of a patient while they are qualified to
get the national medical insurance.
Review & Assessment service and the Ministry of
Health and Welfare jointly release nationwide COVID-
19 patient’s de-identified data and do cooperation
research together with the most prestigious
academies and government organizations. Because
raw data are owned in the organization so that
cohort data are managed by result value sharing
method with implementing analysis code without
personal information leakage.
https://www.hira.or.kr/eng/main.do
IPCI Integrated
Primary Care
Information
The Integrated Primary Care Information (IPCI)
database is collected from EHR records of
patients registered with 391 GPs throughout the
Netherlands. The database contains records from
approximately 2.6 million patients out of a Dutch
population of 17M (8.2%) starting in 1996.
1) Results can only be used in the intent of a
study that is approved by our governance board.
Additional derived studies from large-scale analysis
therefore require approval. 2) Inclusion of IPCI
researchers is required for these derived studies
to provide the proper context and interpretation of
these results.
https://www.ipci.nl/
IQVIA-OpenClaims IQVIA Open
Claims
A United States database of open, pre-adjudicated
claims from January 2013 to May 2020. Data are
reported at anonymized patient level collected
from office-based physicians and specialists via
office management software and clearinghouse
switch sources for the purpose of reimbursement.
A subset of medical claims data have adjudicated
claims.
Inclusion of IQVIA researchers is required in
manuscripts using IQVIA data.
https://www.iqvia.com/solutions/real-
world-evidence/real-world-data-and-
insights
LPD-FRANCE LPD FRANCE LPD France is a computerised network of
physicians including GPs who contribute to a
centralised database of anonymised patient
EMR. Currently, >1200 GPs from 400 practices
are contributing to the database covering
7.8M patients in France. The database covers
a time period from 1994 through the present.
Observation time is defined by the first and last
consultation dates. Drug information is derived
from GP prescriptions. Drugs obtained over the
counter by the patient outside the prescription
system are not reported.
Inclusion of IQVIA researchers is required in
manuscripts using IQVIA data.
https://www.iqvia.com/solutions/real-
world-evidence/real-world-data-and-
insights
LPDItaly IQVIA LPD Italy LPD Italy is comprised of anonymised patient
records collected from software used by GPs
during an office visit to document patients’
clinical records. Data coverage includes over 2M
patient records with at least one visit and 119.5M
prescription orders across 900 GP practices.
Dates of service include from 2004 through
present. Observation time is defined by the first
and last consultation dates. Drugs are captured
as prescription records with product, quantity,
dosing directions, strength, indication and date of
consultation.
Inclusion of IQVIA researchers is required in
manuscripts using IQVIA data.
https://www.iqvia.com/solutions/real-
world-evidence/real-world-data-and-
insights
OptumEhr Optum©
de-identified
Electronic
Health Record
Dataset
Optum© de-identified Electronic Health Record
Dataset is derived from dozens of healthcare
provider organizations in the United States (that
include more than 700 hospitals and 7,000 Clinics
treating more than 103 million patients) receiving
care in the United States. The medical record
data includes clinical information, inclusive of
prescriptions as prescribed and administered, lab
results, vital signs, body measurements, diagnoses,
procedures, and information derived from clinical
Notes using Natural Language Processing (NLP).
1) Please allow for 2 weeks lead time for all
publications using these results to go through
internal review process. 2) The results are considered
fit-for-use and were generated for this specific
protocol. Derivations from the intent of this protocol
are not validated by our institution.3) Our institution
expects all authors to comply with all applicable
personal data protection rules (such as the European
Data Protection Regulation 2016/679, of April 27,
2016). 4) Our institution reserves the right to request
to omit our results from a drafted publication if the
findings could inflict reputational or institutional
harm.
https://www.optum.com/business/solutions/life-sciences/real-world-data/ehr-data.html ?
SIDIAP Information
System for
Research in
Primary Care
(SIDIAP)
The Information System for Research in Primary
Care (SIDIAP; www.sidiap.org) is a primary care
records database that covers approximatly 80%
of the population of Catalonia, North-East Spain.
Healthcare is universal and tax-payer funded
in the region, and primary care physicians are
gatekeepers for all care and responsible for repeat
prescriptions.
1) When using our results, you must always use
this specific name and this citation when referring
to our database. No other labels should be used
in presenting our results: Information System for
Research in Primary Care (SIDIAP). 2) The results
are considered fit-for-use and were generated for
this specific protocol. Derivations from the intent of
this protocol are not validated by our institution. 3)
Our institution expects all authors to comply with all
applicable personal data protection rules (such as
the European Data Protection Regulation 2016/679,
of April 27, 2016). 4) Our institution reserves the
right to request to omit our results from a drafted
publication if the findings could inflict reputational or
institutional harm.
https://www.sidiap.org/index.php/en
STARR-
OMOP
STARR-OMOP STAnford medicine Research data Repository, a
clinical data warehouse containing live Epic data
from Stanford Health Care, the Stanford Children’s
Hospital, the University Healthcare Alliance and
Packard Children's Health Alliance clinics and other
auxiliary data from Hospital applications such
as radiology PACS. STARR platform is developed
and operated by Stanford Medicine Research IT
team and is made possible by Stanford School
of Medicine Research Office. https://arxiv.org/abs/2003.10534
1) When using our results, you must always use
this specific name and this citation when referring
to our database. No other labels should be used in
presenting our results. 2) The results are considered
fit-for-use and were generated for this specific
protocol. Derivations from the intent of this protocol
are not validated by our institution. 3) Our institution
expects all authors to comply with all applicable
personal data protection rules 4) Our institution
reserves the right to request to omit our results
from a drafted publication if the findings could inflict
reputational or institutional harm.
https://med.stanford.edu/starr-omop.
html
VA-OMOP Department of
Veterans Affairs
VA OMOP data reflects the national Department
of Veterans Affairs health care system, which is
the largest integrated provider of medical and
mental health services in the United States. Care
is provided at 170 VA Medical Centers and 1,063
outpatient sites serving more than 9 million
enrolled Veterans each year.
1) Please allow for 2 weeks lead time for all
publications using these results to go through
internal review process. 2) When using our results,
you must always use this specific name and this
citation when referring to our database. No other
labels should be used in presenting our results. We
would like to have the name and description of the
database standardized.3) The results are considered
fit-for-use and were generated for this specific
protocol. Derivations from the intent of this protocol
are not validated by our institution. 4) Our institution
expects all authors to comply with all applicable
personal data protection rules (such as the European
Data Protection Regulation 2016/679, of April 27,
2016). 5) Our institution reserves the right to request
to omit our results from a drafted publication if the
findings could inflict reputational or institutional
harm. In line with item 3, we would like to make
sure that data created and validated with one use
case in mind still fits for other use cases. We do not
anticipate examples where data would produce such
harm (outside of some data quality issue / need for
retraction), but if that were the case, we would need
to alert VA leadership and ensure the wording was
objective. 6) We need to acknowledge our funding
using language like: "This work was supported
using resources and facilities of the Department of
Veterans Affairs (VA) Informatics and Computing
Infrastructure (VINCI), VA HSR RES 13–457." This can
be shortened and arranged in the acknowledgement
section with others. 7) We need a disclaimer such
as: "The views expressed are those of the authors
and do not necessarily represent the views or
policy of the Department of Veterans Affairs or the
United States Government." This can be shorted and
combined with other institutions' disclaimers.
https://www.data.va.gov/