Table 2.
Examples | Strengths | Limitations | Applicability/Utility to Cancer CER |
---|---|---|---|
Existing and Fixed Data Sources | |||
1) Experimental studies (trials) data (e.g., clinical trials, pragmatic trials): | |||
■ ACCENT/ PS2 ■ Women's Health Initiative (WHI) ■ Physicians’ Health Study(PHS) ■ Women's Health Study (WHS) |
■ Detailed and unbiased information on treatment, and important clinical covariates ■ Enormous breadth and diversity of data (across 12 NCI cooperative groups) |
■Limited generalizability ■ Expensive to conduct, requires lengthy follow-up for many outcomes ■ Limited sample sizes ■ Highly specific (i.e. usually single treatment/intervention) ■ Limited in essential/important covariates |
■Utility for CER depends on type of experimental study ■ Broadly defined (or population-based) trials can be useful for CER; but require extensive inclusion of covariates outside of the main trial aims ■ Secondary use of experimental studies for CER could be improved through investments in 1) Pragmatic clinical trials and 2) methods / design development |
2) Non-experimental (observational) studies data | |||
■North Carolina - Lousiana Prostate Cancer Project ■ Cancer Care Outcomes Research and Surveillance Consortium (CanCORs) ■ Health Professionals Follow-Up Study(HPFS) ■ Nurses Health Study (NHS) ■ American Cancer Society Cohort |
■Extensive data on diagnosis, procedures and outcomes ■ Rich in covariates (risk factors, important confounders) ■ Often include patient medical records ■ Can be population-based |
■Expensive to develop and maintain ■ Logistics of study development limit data availability and addition of new hypotheses ■ Several biases may exist: selection; information; recall; and response ■ Unclear event temporality between data collection waves ■ Limited in scope, statistical power beyond initial study aims ■ Proprietary data requiring extensive protocols, procedures |
■ Can be leveraged for comparative effectiveness depending on data quality and extent of biases ■ Utility for CER also dependent on study design, quality/completeness of measures and broad inclusion covariates ■ Can be strengthened through potential data linkages to claims or EHR data which can augment or off-set biases/limitations (can provide temporality of events, verification of treatment/outcomes, etc.) |
3) Registry data | |||
■Surveillance Epidemiology and End Results (SEER) ■ National Program of Cancer Registries (NPCR) ■ National Cancer Data Base (NCDB) ■ National Oncologic Positron Emission Tomography (NOPR) |
■Rich disease information ■ Clinical information at point of care or diagnosis ■ Simultaneously collected with diagnosis and treatment ■ Opportunity for recruitment into cohorts or trials ■ Can link with administrative data |
■Potential sampling biases (selection, inclusion, etc.) ■ Questionable generalizability ■ Primarily limited to first occurrence of event or disease and limited inclusion of covariates ■ Unknown response, toxicity, patient reported outcomes ■ Challenging for longitudinal data capture ■ Sparse patient identifiers ■ Challenging for selecting controls / comparator populations |
■Do not provide enough complete data for rigorous CER ■ Linkages to additional data are necessary to provide missing information ■ Dearth of literature on solutions/methods for inherent biases, interoperable study design, and evaluation/application of comparator populations |
4) Administrative and claims data | |||
■Most health insurance programs: Medicare; Medicaid; Blue Cross / Blue Shield, etc. ■ Medstat / Marketscan ■ United Health |
■Represents large proportion of US population ■ Rich patient-level data: demographics, procedures, treatments ■ Includes temporality of events ■ Some include organizational/ provider characteristics ■ Most have unique identifiers enabling linkage to other data |
■Design/structure often impacts data sensitivity/specificity ■ Missing important clinical etiologic information ■ Includes date or type of testing procedures, but no results (e.g., pathology, tumor response, genetics, vital stats, etc.) ■ High patient turn-over ■ Complicated data structure requires significant learning-curve and programming resources ■ Burdensome and prohibitive data use agreements ■ Expensive to obtain ■ Untimely data releases – significant time lags |
■Missing key CER components including vital tumor and disease information ■ Linkages can supplement missing information – but costs and/or DUA's often inhibit additional linkages ■ Utility for CER would be greatly improved through institutional and governmental policies which overcome limitations (i.e., funding, training, collaboration) |
5) Electronic health records | |||
■Health care systems: Veterans Administration (VA); HMO-network; Kaiser; Mayo; Geisinger; US Oncology ; UK General Practitioners Research Database (GPRD) ■Large vendors: GE Health; Allscripts/Misys; Epic; McKesson; NextGen |
■Includes multiple data components (practice management, electronic patient record, patient portal) ■Fully integrated EHR's provide clinical information, claims, tumor specifics, longitudinal follow-up, objectively measured events ■Allows for studies of toxicity, quality of life, natural history |
■Populations are not generalizable ■Lack of standardization of patient information and clinical measures between systems (technology, data structure, and coding) ■Missing or insufficient data elements necessary for CER ■Imperfect record keeping/follow up - Patients not consistently maintained within a single system/EHR ■Enormous expense to obtain data from private sector/vendors |
■Currently there is limited utility for EHR data from private vendors ■However examples from VA and universal/national systems (UK, Canada), exemplify potential of EHR sources ■Future utility dependent on: standardization of measures and data systems/interoperability; standard linkage variables; public and private institutional data governance and stewardship |
6) Other Data | |||
■Genetic and genomic data ■Geospatial data ■Environmental monitoring data ■Over the counter drug purchasing ■Health seeking on internet ■Patient-networking sites, 66 ■Syndromic surveillance |
■Data at both patient and ecological level ■Information on behavioral and environmental risks ■Can provide information on disease determinants ■Self-reported experiences, exposures, outcomes |
■ Unclear how to identify, define and utilize these data | ■ Utility to CER dependent on integration into other data, specifically clinical care data |
Hybrid Data Sources | |||
7) Linked clinical and claims data | |||
■SEER-Medicare ■State Cancer Registry – Medicare/Medicaid ■WHI-Medicare |
■Includes clinical and health services data ■Provides temporality of events ■Large population samples; ability to study rare events/treatments ■Provides access to controls or comparison populations ■Allows for adjudication/validation of events (i.e., self-reported) ■Can detecting recurrence |
■Missing information (eg, HMO or supplemental insurance); often highly specific populations (>65, disabled, etc) ■Non-covered services are excluded (e.g., prescription drugs, long-term care, free screenings) ■Missing vital clinical information (tumor response) ■Treatment rationale and test results are unknown ■Complicated algorithms needed to characterize treatment ■Large, complex data require advanced training/experience ■Delay in research access |
■Powerful for CER studies because of large, generalizable populations ■Large number of covariates and clinical information ■Lengthy follow-up available including information on temporality of treatment and events ■Could be strengthened by linkages to laboratory and clinical results |
8) Validation study data | |||
■Internal validation studies49, 67 ■External validation studies47 |
■Rich disease information ■Used to minimize limitations of other data ■Can give estimates of associations not discernable within data |
■Lack of validated studies exist for CER ■Methodologic limitations and lack of model transportability to CER |
■To be useful for CER an investment in methodologic work is required -- similar to P01 CA 142538 “Statistical Methods for ■ Cancer Clinical Trials” (PI, Kosorok) ■Validation studies could lead to immediate return of investment with regard to leveraging existing data for CER |