Skip to main content
. 2023 Feb 3;13:1971. doi: 10.1038/s41598-023-27481-y

Table 2.

Summary of the NLP/ML component outcomes.

Phenotype People Involved Charts reviewed Precision Recall Comments
Chronic rhinosinusitis 2 126 76% → 78–83% 97% → 100% Also significant improvement on specificity
ECG traits 1–3 1050 Cases: 80–100% Controls: 94–99% N/A Unable to extract 1 sub-phenotype; precision varied between sub-phenotypes
Systemic lupus erythematosus 2–3 1022 99% → 96% 79% → 91% 2/3 sub-phenotypes performed better at validation site
Asthma/chronic obstructive pulmonary disease overlap 1–2 300 90% → 91% 38% → 54% Although overall improved, performed worse at validation site possibly due to how the ML model used counts of features
Familial hypercholesterolemia 1–4 150 96–98% → 74–96% N/A Negative predictive value decreased
Atopic dermatitis 1–3 150 73–79% → 72–84% 51–54% → 63–75% Mixed results across sub-phenotypes & sites

The “People Involved” column lists the estimated number or range of full-time equivalent persons involved with all aspects of the implementation, and includes programmers, clinicians, and computational linguists. Charts reviewed is the total number of patients’ charts reviewed for each phenotype, a sum of the charts reviewed for cases, and controls if applicable, at both the lead and validating sites. Precision and Recall columns list those statistics for the original computable phenotype rule-based algorithm vs. the new computable phenotype rule-based algorithm with NLP components added: arrows indicate change in these statistics from these original to new phenotype algorithms. Some algorithms have a range for precision or recall as either multiple (secondary) validation sites reviewed patients’ charts from which accuracy statistics were calculated, or there were separate precision/recall measures for sub-phenotypes. N/A not applicable: recall was not targeted for improvement in all phenotypes; thus, it was not calculated for all phenotypes.