Table 1.
Classification method | Classifier description | Minimum selection criteria | ||
---|---|---|---|---|
ICD9/10 Diagnosis criteria |
Visit criteria | Other criteria | ||
Rule-based | ||||
ICD-stricta | 3 COPD-specific codes | 3 or more COPD-specific codes | None | |
ICD-broadb | 2 COPD-specific codes | 2 or more COPD-specific codes | None | |
Control selection | 0 COPD-specific codes | Subjects with no history of COPD related codes | 2 encounters in MGB Biobank | |
Model-basedc | ||||
Automatic extraction NLP features | ||||
SAFE-NLP | Model selected from surrogate assisted feature extraction with natural language processing of unstructured EHR data (narrative text from clinic notes) | At least 1 COPD-specific code and at least 3 broad COPD codes | 1 visit with electronic clinical note in the EHR | Selected by classifier |
Curated (CRT) features | ||||
CRTPFT- | Model selected from literature-based and expert-curated feature inputs primarily derived from structured data, excluding measures of spirometric FEV1/FVC performance | At least 1 COPD-specific code and at least 3 broad COPD codes | 1 visit with electronic clinical note in the EHR | Selected by classifier |
CRTPFT+ | Model selected from the feature space of CRTPFT-, but inclusive of measures of spirometric FEV1/FVC performance | At least 1 COPD-specific code and at least 3 broad COPD codes | 1 visit with electronic clinical note in the EHR | Selected by classifier |
Mixed (automatic + curated) features | ||||
CRT + SAFE | Model based on combining the full feature space for CRTPFT+ and SAFE | At least 1 COPD-specific code and at least 3 broad COPD codes | 1 visit with electronic clinical note in the EHR | Selected by classifier |
aCOPD-specific codes include: 1) ICD9: 491.2, 493.2, and 496.*; 2) ICD10: J43.* or J44.*.
bBroad COPD codes include any codes with the following base numbers: 1) ICD9: 491.*, 492.*, 493.2*, and 496.*; 2) ICD10: J40.*, J41.*, J42.*, J43.*, J44.*.
cAll model-based algorithms were developed using probability-based thresholding via logistic regression models selected using a threshold for specificity at 95%.