Phenotyping HF in VA national EHR using ML and NLP models. After developing and testing NLP and ML models in the model development cohort of 20,000 patients, we applied the models to the gold standard cohort of 200 patients. We calculated NLP and ML scores and derived NLP + ML scores for each of the 200 patients and estimated the best threshold scores for each model using the highest F score values. Using threshold scores for each model, we classified HF in the gold standard cohort. Th NLP + ML model had the highest PPV and was chosen as the best‐performing model. All 3 models performed better than the traditional ICD code approaches for identifying HF cohorts. The presence of ≥1 ICD code of HF as the principal hospital discharge diagnosis was used to defined the ‘Inpatient’ HF cohort and ≥2 ICD codes as primary outpatient encounter diagnoses were used to define the ‘Outpatient’ HF cohort (‘Either’ included ≥1 inpatient and ≥2 outpatient HF diagnoses). Abbreviations: AI, artificial intelligence; EHR, electronic health record; EPRP, External Peer Review Program; HF, heart failure; ML, machine learning; NLP, natural language processing; PPV, positive predictive value; SVM, support vector machine; VA, Veterans Affairs.