Skip to main content
. 2021 Jul 7;4(7):e2114723. doi: 10.1001/jamanetworkopen.2021.14723

Table 1. Data Sources, Extraction Method, and Description.

Key variable Data sources and extraction method Variable description
Structured data Unstructured data
Demographic characteristic
Birth date Demographics NA NA
Sex Demographics NA NA
Race/ethnicity Demographics NA NA
Clinical outcomes
Diagnosis date Diagnosis codes (ICD-9 and ICD-10 codes) NICE Date of lung cancer diagnosis
Date of death Death report NA NA
Prognostic factors NA NA NA
Stage NA NICE TNM stage and clinical stage
Histologic type NA NICE NSCLC (ie, adenocarcinoma, squamous cell carcinoma, other non-small cell carcinoma) or small cell lung cancer
Smoking status NA NA Smoker or nonsmoker
BMI Vital signs EXTEND Calculated as weight in kilograms divided by height in meters squared
ECOG performance status NA EXTEND Grade 0 to 4
Laboratory test Laboratory test codes NA Complete blood count, metabolic panel, lipid panel, liver panel, hemoglobin A1C, and urinalysis
Tumor somatic variant information NA NICE Genetic alterations in EGFR, KRAS, ALK, ROS1, MET, or BRAF
Medical history Diagnosis codes (ICD-9 and ICD-10 codes) NA Respiratory disease (eg, COPD and asthma), cardiovascular disease, type 2 diabetes, and others
Treatment
Surgical treatment Procedure codes (CPT and ICD-10 codes) NA Surgical procedure (ie, lobectomy, segmentectomy, wedge resection, video-assisted thoracic surgical procedure) with surgical admission and discharge dates
Radiation therapy Procedure codes (CPT and ICD-10 codes) NA Radiation therapy procedure, treatment start and end dates
Chemotherapy Procedure codes (CPT and ICD-10 codes) and medication name codes NA Chemotherapy procedures, chemotherapy drugs, and treatment start and end dates
Target therapy and immunotherapy Medication name codes NA Target therapy and immunotherapy drugs and treatment start and end dates

Abbreviations: BMI, body mass index; COPD, chronic obstructive pulmonary disease; ECOG, Eastern Cooperative Oncology Group; EXTEND, Extraction of Electronic Medical Record Numerical Data; ICD-9, International Classification of Diseases, Ninth Revision; ICD-10, International Statistical Classification of Diseases and Related Health Problems, Tenth Revision; NA, not applicable; NICE, Natural Language Processing Interpreter for Cancer Extraction; NSCLC, non–small cell lung cancer.