Skip to main content
. 2021 Oct 7;11:19959. doi: 10.1038/s41598-021-98719-w

Table 1.

Algorithms for classifying chronic obstructive pulmonary disease.

Classification method Classifier description Minimum selection criteria
ICD9/10
Diagnosis criteria
Visit criteria Other criteria
Rule-based
ICD-stricta 3 COPD-specific codes 3 or more COPD-specific codes None
ICD-broadb 2 COPD-specific codes 2 or more COPD-specific codes None
Control selection 0 COPD-specific codes Subjects with no history of COPD related codes 2 encounters in MGB Biobank
Model-basedc
Automatic extraction NLP features
SAFE-NLP Model selected from surrogate assisted feature extraction with natural language processing of unstructured EHR data (narrative text from clinic notes) At least 1 COPD-specific code and at least 3 broad COPD codes 1 visit with electronic clinical note in the EHR Selected by classifier
Curated (CRT) features
CRTPFT- Model selected from literature-based and expert-curated feature inputs primarily derived from structured data, excluding measures of spirometric FEV1/FVC performance At least 1 COPD-specific code and at least 3 broad COPD codes 1 visit with electronic clinical note in the EHR Selected by classifier
CRTPFT+ Model selected from the feature space of CRTPFT-, but inclusive of measures of spirometric FEV1/FVC performance At least 1 COPD-specific code and at least 3 broad COPD codes 1 visit with electronic clinical note in the EHR Selected by classifier
Mixed (automatic + curated) features
CRT + SAFE Model based on combining the full feature space for CRTPFT+ and SAFE At least 1 COPD-specific code and at least 3 broad COPD codes 1 visit with electronic clinical note in the EHR Selected by classifier

aCOPD-specific codes include: 1) ICD9: 491.2, 493.2, and 496.*; 2) ICD10: J43.* or J44.*.

bBroad COPD codes include any codes with the following base numbers: 1) ICD9: 491.*, 492.*, 493.2*, and 496.*; 2) ICD10: J40.*, J41.*, J42.*, J43.*, J44.*.

cAll model-based algorithms were developed using probability-based thresholding via logistic regression models selected using a threshold for specificity at 95%.