Skip to main content
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: Cancer. 2019 May 15;125(17):2926–2934. doi: 10.1002/cncr.32118

Table 2:

Challenges and Opportunities for Big Data in Sarcoma Research

Challenge Opportunity
Dependence on administrative/billing data that may not have high clinical accuracy Improve data quality by validating codes to appropriately identify key diagnoses
Difficulty validating initial coding process based on preexisting documented diagnosis codes Multidisciplinary discussions and agreements on data elements, such as ICD10 and ICD-O-3 codes, prior to submission to large databases

Reduce variability in sarcoma nomenclature and cancer classification
Errors in large databases may be amplified for rare diseases, such as Sarcoma Improve on current database architecture models to make population-based registries more clinically relevant

Utilization of sarcoma-specific databases with more granular, clinically relevant data
Data silos created due to lack of information sharing amongst multiple institutions and databases Automated aggregation of real world data (both structure) supplemented by NLP-assisted manual curation of unstructured clinical documentation such as the ASCO CancerLinQ initiative and Flatiron Health’s database.

Linkage to administrative databases to validate information in real world evidence based datasets.