Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2024 Nov 12;32(1):113–118. doi: 10.1093/jamia/ocae169

Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records

Joshua Trujeque 1,, R Adams Dudley 2,3,4,5, Nathan Mesfin 6, Nicholas E Ingraham 7, Isai Ortiz 8, Ann Bangerter 9, Anjan Chakraborty 10, Dalton Schutte 11, Jeremy Yeung 12, Ying Liu 13, Alicia Woodward-Abel 14, Emma Bromley 15, Rui Zhang 16, Lisa A Brenner 17,18,19,20, Joseph A Simonetti 21,22
PMCID: PMC11648724  PMID: 39530748

Abstract

Objective

Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.

Materials and Methods

We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as “definite access”, “definitely no access”, or “other”.

Results

Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as “other”. Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.

Discussion and Conclusion

Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients’ firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.

Keywords: NLP, suicide, firearm, veteran, gun

Introduction

The age-adjusted suicide and firearm suicide rates in the United States increased by 31% and 27%, respectively, from 2001 to 2021.1 Firearm injuries accounted for 52% of all suicides during that time period. Suicide rates are even higher in certain US subgroups, including military veterans, among whom 72% of suicides were attributed to firearm injury in 2021.2 Firearm access is a strong, independent risk factor for suicide, homicide, and unintentional injury.3–5 Healthcare systems are important settings for preventing firearm-related harms,6 especially suicides, in part because the majority of patients who die by suicide seek care prior to their deaths.7 Understanding which at-risk patients have access to firearms may be helpful in guiding targeted prevention efforts.

Despite the association between firearm access and suicide,5,8 there is no International Classification of Diseases code to indicate that a given patient has access to a firearm. Some methods to collect firearm-related data have been developed and are available in some electronic health records, such as implementation of firearm screening measures or documentation of suicide safety plans and risk assessments in clinical settings.9–11 However, completeness and accuracy of these approaches are dependent on system- and clinician-level decisions to employ those assessments, and clinicians’ decisions to ask about and document firearm access.

Use of natural language processing (NLP) to identify patients with firearm-related content in their notes is one potential approach to identifying firearm access. To our knowledge, only two studies have addressed this issue.12,13 The first was conducted in an integrated healthcare system serving 600 000 patients in a single mountain region city.12 Investigators developed a query which was used to identify whether firearm access was assessed among 8994 patients with either suicidal ideation or behaviors from 2010 to 2015. Queries were conducted on all clinical notes on or 1 month after the index date (defined as the date suicidal ideation/behavior was identified). They found that 31.2% of patients with recent suicidal ideation and 35.8% of patients with recent suicidal behavior had documentation suggesting that they had been assessed for firearm access at least once during the follow up period. Sensitivity (97.4%), specificity (87.1%), positive predictive value (82.6%), and the F1 score (89.4%) of this approach were high. Importantly, this approach evaluated whether an assessment was conducted but was not designed to assess whether patients had access to firearms; information that is critical for risk prediction and clinical care.

In the second study, Brandt et al. developed an NLP algorithm using a large corpus of clinical notes from a cohort that included 762 953 veterans during their first year receiving care in the Veterans Health Administration (VHA).13 Using 27 firearm-related terms and three predefined firearm access classes (positive access, negative access, or ambiguous), they found that less than 10% of patients had any mention of firearms in their first year of clinical notes, and only 24.4% of those were categorized as having access to a firearm. Of note, Brandt et al. used a single machine learning approach (a random forest model). Notably, according to findings from nationally representative surveys, approximately one-half of US veterans are firearm owners and 14% of firearm owners receiving care in VHA settings report that a clinician has ever discussed firearms with them.14,15

Objective

We sought to test the hypothesis that applying a broader set of modeling approaches would generate useful information about model choices for future research and clinical use of NLP. To address this, we focus on the firearm-related content among the general population of veterans receiving VHA health care.

Methods

This is a retrospective analysis of preexisting records of patients who were receiving healthcare from the VHA. Data are from the VHA Corporate Data Warehouse. This study was approved by the Minneapolis Veterans Administration Health Care System Institutional Review Board.

Study patients

We identified all patients receiving care in VHA between April 10, 2023 and April 10, 2024. Using the most recent encounter within the date range as an index date, we obtained all clinical notes for 90 days preceding the index date. Sampling was done without replacement and excluded patients sampled in preliminary work.

Term list generation

The initial term list was identified using the Unified Medical Language System Metathesaurus (n = 25 terms). We added an additional 40 terms from investigator input and literature review.12,13 Given that clinician documentation may sometimes reflect the language used by patients and those with content familiarity, we expanded the list through input from veterans, firearm users, and individuals who worked in firearm retail and shooting locations. The full list included 89 terms. We then determined the frequency of each term in the notes in the 90 days preceding the index date using a basic text string search. We narrowed our term list to the 48 terms that appeared at least once in our note sample (Supplementary Table S1).

Snippet generation

We obtained all clinical notes from the 90 days prior to the index date and created a snippet of text surrounding each firearm term that was identified. Snippets contained a 125-character span before and after each identified term. Snippets were sorted by term and placed in term sets that contained all variations of the term (eg, firearm, firearms, fire arm, and fire arms are all in a single set). Sets of terms were sorted for annotation based on terms that could identify the maximal proportion of unique veterans.

Firearm class development

Like Brandt et al.,13 we developed three initial attribute classes: definite firearm access, definitely no firearm access, and “other” firearm access (eg, historical firearm access; planning to acquire a firearm; uncertain access).

Annotation

We annotated the most frequent terms (eg, gun, shoot, ammunition) and very specific firearm terms (eg, pistol, revolver). The latter were chosen because they might be more likely to be written by clinicians rather than appearing in templated text (common in VHA notes), and thus might have a different sensitivity and specificity for firearm access. During annotation methods development, all snippets were double annotated by five physicians to calculate inter annotator agreement (IAA). IAA was found to be over 80% for all pairs of physicians, so all snippets in the analysis portion of this study were annotated by one of those five physicians.

Description of patient population and snippet corpus

We reported the percentage of patients for which each of the firearm terms appeared at least once during the 90-day period. We also reported the percentage of snippets that were initially identified by each of the firearm terms.

Development of automated text classifier models

Consistent with prior attempts to classify patients’ firearm access,13 we developed random forest models. We also trained three other nonneural classification models: bagging, gradient boosting, and logistic regression with ridge penalization. We also implemented domain-specific, pretrained BERT models: BioBERT and Bio-ClinicalBERT. The nonneural models were fine-tuned using a 5-fold cross validation scheme on 85% of available data. The remaining 15% was used to evaluate each model after hyperparameter tuning. A hyperparameter space was defined for each model and a random combination of ten hyperparameters was selected for model training. Evaluation was performed with the best hyperparameter combination by training the model on the full set of training data and evaluating it on the held-out test set. To address randomness, each model was trained on 10 different random seeds with 10 hyperparameter combinations being evaluated for each model.

BioBERT and BioClinicalBERT were fine-tuned. Each model was trained on 2351 examples of snippets containing the “definite access”, “definitely no access”, and “other” categories. During training 294 snippets were used for validation. The trained model was tested on 294 snippets. We used the same training, validation and test data for all BioBERT and Bio-ClinicalBERT models. We used the AdamW optimizer with a learning rate of 2e−5.16 To assess the impact of choice of initial starting parameters, we performed 100 iterations of this process using 100 different seeds. Seeds are used to initialize a random number generator that chooses the initial values for parameters in each model. The same 100 seeds were used across all models. We then calculated mean performance and 95% confidence intervals across the 100 iterations.

Model input for non-neural models. To convert text data into a format to be input into models, term frequency times inverse document frequency (TF-IDF) vectors were constructed from the snippets. Term frequency is the frequency of the term within the snippet (so if “gun” appears twice, the term frequency is two). Document frequency (DF) is the number of documents in the entire corpus in which the term appears. Inverse document frequency is 1/DF and is used to zero our terms that are extremely common. For example, the word “patient” might appear in almost all hospital notes, so the DF is very large and its inverse is very small. Multiplying TF by a very small number makes the TF-IDF low, suggesting the term is unimportant. Conversely, if TF is high but the term is rare across all patients (so DF is low and IDF is high), then the TF-IDF is high, suggesting the term is important for this specific patient. To convert text to terms, all unique trigrams were extracted from all snippets in the training dataset. A trigram is three consecutive words (eg, “gun at home” “has no gun”). Then, a sparse vector containing the count of each unique trigram present in a given snippet is determined for each snippet. Then, the TF-IDF vector was calculated for each unique trigram for each snippet.

Model output. The output of the models was assignment of each snippet to one of three classes: “definite access”, “definitely no access”, and “other”.

Model descriptions. Random forest models are composed of multiple decision trees, each trained on different subsamples of training data. By training multiple trees on subsamples and aggregating the output from the trees, prediction variance is reduced and the models are less prone to overfitting training data as compared to a lone decision tree trained on all of the data. Trees are individually fit using the “best split” principle which involves calculating values for different splits of the data at each branch and selecting the split which is optimal with respect to a predetermined criterion.17

A bagging classifier is similar to a random forest in that it is an ensemble of other models whose output is averaged to obtain a final prediction. In the case of our bagging classifier, a decision tree was used as the base classifier. The two main differences between a bagging classifier and a random forest are: (1) the bagging classifier produces subsamples by sampling with replacement where a random forest typically does not, and (2) the individual classifiers in a bagging model use all variables to determine splits while in a decision tree only a subset of variables are used to determine splits. We included this model as it can reduce possible over fitting.17

The gradient boosting model is essentially a sequence of ensembled weak models. Instead of training one large model where the smaller models are aggregated at the prediction step, gradient boosting keeps adding weak models which try to correct the previous weak model in the chain. By doing so, the gradient boosting focuses on improving the mistakes of the previous models. We included this model because it does well at reducing possible variance and bias.17

The logistic regression with ridge penalization learns weights from a linear combination of variables to produce an output that is then transformed into a normalized probability distribution over the classes. To reduce the risk of overfitting the logistic regression model we applied an L2 (or ridge) penalty to the summary statistic.18 We chose this model because it is commonly used for classification.

The Bidirectional Encoder Representations for Transformers (BERT) is built on the transformers architecture whereby an encoder processes text input and a decoder generates task-specific predictions. The transformer encoder is considered bidirectional as it reads the entire sequence of words at once, providing much better contextual relations between words in a text than previous non-bidirectional models. There are now many versions of BERT, created to address specific domains. Specifically, BioBERT is a version of BERT that was fine-tuned on a large biomedical corpus,19 and Bio-ClinicalBERT was initialized from BioBERT and further fine-tuned on clinical notes from an intensive care unit database.20

Hyperparameter selection for non-neural models. In an effort to produce the best models for a strong baseline, a hyperparameter space was defined for each model based on its unique parameters. A hyperparameter is a parameter of the model that is set before model training begins (eg, decision tree depth, number of models in ensemble).21 During a single training session, 10 hyperparameter combinations were randomly chosen for each model. A random hyperparameter search was chosen since an exhaustive grid search is considerably more computationally intensive. We used five-fold cross validation. For each fold, 10 hyperparameter combinations were evaluated. Out of the 50 possibilities (5 folds × 10 hyperparameters), the model with the best F1 score was kept. This process was performed 100 times using the 100 seeds we used for our BERT models to seed the random number generator.

Model evaluation. Precision is also known as positive predictive value and is the percentage of snippets labeled as being in a category (such as “definite access”) that are actually in that category, as judged by the physician annotators. It is calculated as (true positives)/(true positives + false positives). Recall, is also known as sensitivity and is the percentage of snippets that are in a category that are labeled as being in that category by a model. It is calculated as (true positives/true positives + false negatives). The F1 score summarizes these two measures and is calculated as the harmonic mean of precision and recall. We report the weighted average version of precision, recall and F1, with the weights applied being the proportion of testing snippets that were annotated as “Definite access”, “Definitely no access”, or “Other”. Bootstrapping with 100 cycles of random sampling of the testing set was used to calculate confidence intervals (CI) on the F1 scores.22 The Python scikit-learn package was used for all preprocessing, feature extraction, model training, and evaluation.23 Paired t-tests were performed on the average weighted F1 scores to determine statistically significant differences in performance between models.

Results

We randomly selected 88 863 patients from the population of approximately 6.6 million veterans receiving care at the VHA during the study period. A firearm term was identified in the clinical notes of 36 685 patients (41.3%). The mean age of patients whose record included a firearm term was 56 years; 12.7% were identified as female; 67.2% were identified as white and 84.9% were identified as non-Hispanic. A total of 84 474 text snippets were generated from 1 798 819 notes that included at least one firearm term.

Firearm term prevalence

Among the 36 685 patients with at least one firearm term identified in their clinical record, the most common terms were gun (appeared in at least one snippet for 88.2% of patients), firearms (14.5%), shooting (6.7%), guns (6.6%), ammo (5.2%), and firearm (4.2%) (Table 1). These six terms accounted for 89.4% of all identified snippets although more than one term could appear within a specific snippet. Of the 48 total terms that appeared at least once in our note sample, individual term prevalence ranged from less than 0.003% to 88.2% (full results, Supplementary Table S1).

Table 1.

Prevalence of top six terms used to identify snippets at patient- and snippet-level in the 90 days preceding index date.

Number and percentage of patients (among total sample of 36 685) with at least one mention of this term
Number and percentage of snippets (among total sample of 83 474) with at least one mention of this term
Firearm terma n % N b %
Gun 32 364 88.22 46 738 55.99
Firearms 5334 14.54 12 824 15.36
Shooting 2469 6.73 3913 4.69
Guns 2436 6.64 4601 5.51
Ammo 1924 5.24 3494 4.19
Firearm 1546 4.21 3016 3.61
a

More than one term may appear in a patient’s record.

b

Counts represent the number of snippets that were generated using that term. Snippets may include more than one term.

Additional results for all 48 terms identified in our sample are shown in Supplementary Table S1.

Annotation classification results

Of the 3000 snippets annotated, 33.7% were labeled as “definite access”, 9.0% as “definitely no access”, and 57.2% as “other”.

Results of automated text classifier modeling

Table 2 shows model performance for the task of classifying snippets into “definite access”, “definitely no access”, or “other”. Of the four nonneural models, the random forest model had the lowest F1 score (P < .05 for comparison to all other models). The logistic regression model had statistically significantly superior performance to the bagging and gradient boosting models (P < .05 for both comparisons), although the absolute difference between the logistic regression and the bagging and gradient boosting models was much smaller than the difference between those three and the random forest model. The BERT models, however, outperformed the non-neural models, while exhibiting comparable performance to each other.

Table 2.

Comparison of model performance for classifying snippets into “definite access”, “definitely no access”, or “other”.

Model type Weighted average precision Weighted average recall Weighted average F1 score (95% CIa)
Random forest 0.711 0.634 0.543 (0.536-0.551)
Bagging 0.813 0.814 0.811 (0.807-0.815)
Gradient boosting 0.819 0.820 0.816 (0.811-0.820)
Logistic regression 0.834 0.835 0.832 (0.828-0.835)
BioBERT 0.883 0.875 0.876 (0.874-0.879)
Bio-ClinicalBERT 0.901 0.896 0.896 (0.894-0.899)
a

Bootstrap confidence interval (CI).

Discussion

In this study of a national sample of VHA patients, we identified firearm-related terminology in 41.3% of clinical records within a 3-month period. After human annotators categorized snippets into three categories of firearm access that are informative for risk prediction and clinical interventions, we tested several modeling approaches to compare their performance in automatically classifying firearm mentions into those categories. We identified several models with F1 scores above 0.8 which is considered good for the task of assigning snippets to the three firearm access categories. These findings are highly informative for efforts aimed at identifying important suicide risk factors, such as firearm access, using unstructured data elements.

Our study builds on that of Brandt et al.13 From 2012 to 2017, they reported that less than 10% of patients had firearm-related text in their notes. Notably, this period was before much of VA’s expansion of efforts to assess and document firearm access as part of its comprehensive approach to suicide prevention. By 2017, they found that 21% of patients had firearm-related content over a 12-month period. In comparison, we find that 41.3% of patients had such content over a 3-month period. This higher prevalence could reflect increasing efforts to conduct evidence-informed interventions specific to firearms, such as training clinicians to assess and document access or embedding templated prompts about firearm access in suicide risk assessments and interventions. We may also have identified more patients with firearm-related terminology because we searched for records using more firearm-related terms (Brandt et al. = 27 terms; current study = 48), although nearly all snippets and patients were identified using common terms that were also included in the search strategy by Brandt et al. For example, like Brandt et al., we found that the vast majority of mentions of firearm-related terms were accounted for by variants of “gun” and “firearm”. Both studies also found many instances in which annotators could not make a determination about firearm access. Finally, the precision and recall for our random forest models were similar (Brandt et al., 0.692 and 0.571, respectively; our model, 0.711 and 0.634, respectively).

In addition, we extend the literature by comparing five other approaches to text classification to a random forest model (the type of model used by Brandt et al.). We found that all five performed better, at least in our sample and with a training set of the size we used. This in part may be because, compared with the other models, random forests are especially sensitive to class imbalance because of the way decision trees work. When training, the decision tree looks for the hyperplane that will correctly classify most of the data points, which relies on the frequency of data points from each class. When generating subsamples, it is more likely that a subsample is devoid of an entire class as that class’s proportion of the total data distribution tends to zero. If a tree does not see an entire class of data during training, it will have high variance when making predictions on data from those classes. If enough trees are trained without exposure to an entire data class, the aggregate prediction from the forest has higher variance and is likely to perform poorly when predicting those classes. Therefore, it is possible that, with a larger number of training snippets, the performance of a random forest model may improve. That we used only 62% more snippets than Brandt et al. (3000 vs 1856) means that one cannot use our two studies to conclude much about whether larger numbers of snippets used might have more effect on random forest model performance.

Nonetheless, these five text classification models are quite promising, because their higher recall (ie, sensitivity) means that more patients with firearm access could be identified and provided with clinical interventions focused on firearm suicide risk. Importantly, their higher precision (positive predictive value) also means that patients expected to have firearm access are more likely to actually have it, thus ensuring that resources are most efficiently directed to the appropriate patients.

We find the domain-specific pretrained BERT models offer statistically significantly superior results to the bagging, gradient boosting, and logistic regression with ridge penalization models. However, these results should be tested further for several reasons. First, the differences in performances is not that large, so may reflect the sample studied or other idiosyncrasies of our study design. Second, these models’ F1 scores might converge more with more annotated data.

These findings suggest important areas for future work. The first area relates to the large proportion of “other” notes that both we and Brandt et al. report. We identified many cases in which clinicians assess access but do not clearly document a determination about firearm access. It is possible that simple documentation improvements through clinician training or template modification could improve risk ascertainment. A second area of future work relates to text classification research. Investigators could create a larger corpus of snippets or explore using full notes. Extensions of our assessment of whether elevated risk patients have notes richer in firearm content could help investigators compile those larger and more relevant corpora. Similarly, and especially if larger corpora of annotations were available, other machine learning classification models that rely on neural networks, including LLMs, could be applied. Finally, it will be imperative to determine whether model performance varies among patient subgroups—such as by gender, age, race, ethnicity, or underlying mental health status. Addressing this question might require oversampling of these groups and accounting for the possibility of documentation differences based on the clinical settings in which subgroups are seen.

There are limitations of this study. This study addresses classification of firearm access among randomly selected patients. Clinicians caring for patients with identified suicide risk might be more likely to document firearm access more clearly, and NLP models might therefore perform differently. Second, these methods may be sensitive to the clinical setting and settings outside the VHA may have different documentation practices. Such differences might include the frequency of use of templates and wording or structure of those templates or could even arise from differences in how clinicians are trained to document a topic. This may limit the generalizability of our findings to other settings.

Conclusion

Understanding which patients have access to firearms is necessary for delivering patient-centered suicide prevention care. Given that there are limited existing mechanisms to identify firearm access using structured data elements, understanding how to evaluate access from text could be a critical step toward preventing suicides. However, we find that whether a patient has access still cannot be determined for most patients, even though many patients have notes with firearm content. We describe models that perform better than previously reported random forest models, at least in our study setting and with our study design. Future research could address whether our findings can be replicated in other clinical settings and whether model performance is different in important subpopulations than in the random sample of patients we included. In addition, operational efforts could aim to improve the clarity of documentation about firearm access which would ultimately improve model performance.

Supplementary Material

ocae169_Supplementary_Data

Contributor Information

Joshua Trujeque, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States.

R Adams Dudley, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States; Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States; Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States; School of Public Health, University of Minnesota, Minneapolis, MN 55455, United States.

Nathan Mesfin, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States.

Nicholas E Ingraham, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States.

Isai Ortiz, Medical School, University of Minnesota, Minneapolis, MN 55455, United States.

Ann Bangerter, Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States.

Anjan Chakraborty, Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States.

Dalton Schutte, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States.

Jeremy Yeung, Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States.

Ying Liu, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States.

Alicia Woodward-Abel, Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States.

Emma Bromley, Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States.

Rui Zhang, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States.

Lisa A Brenner, Rocky Mountain Mental Illness Research, Education and Clinical Center for Suicide Prevention, Rocky Mountain Regional VAMC (RMR VAMC), Aurora, CO 80045, United States; Department of Physical Medicine and Rehabilitation, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States; Department of Neurology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States; Department of Psychiatry, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States.

Joseph A Simonetti, Rocky Mountain Mental Illness Research, Education and Clinical Center for Suicide Prevention, Rocky Mountain Regional VAMC (RMR VAMC), Aurora, CO 80045, United States; Firearm Injury Prevention Initiative, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States.

Author contributions

Joshua Trujeque and R. Adams Dudley are co-first authors. All authors made substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work. All authors were involved in drafting the work or reviewing it critically for important intellectual content. All authors gave final approval of the version to be published. All authors are in agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Supplementary material

Supplementary material is available at Journal of the American Medical Informatics Association online.

Funding

This work was supported by the U.S. Department of Veterans Affairs Office of Suicide Prevention (no grant number is assigned). The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.

Conflicts of interest

J.T. is supported by NIH NHLBI T32 HL07741. R.A.D. reports grants from NIH NIDDK (R01DK115629), AHRQ (P30HS029744), VA HSR&D, and the Centers for Disease Control and Prevention Office of Forecasting and Analysis. N.E.I. is supported by NIH NHLBI K23HL166783. N.M.M. is supported jointly by the University of Minnesota Medical School and Clinical and Translational Science Institute (CTSI) early career research award. R.Z. is supported by NCI R01CA287413, NIA R01AG078154, NCCIH R01AT009457, and NIHMD R21MD019134. Y.L. is supported by 2R01AT009457, 1R01AG078154, and 1R01CA287413. J.A.S. is supported by Career Development Award #1IK2HX002861-01A2 from the United States Department of Veterans Affairs, Health Services Research and Development Service. L.A.B. reports grants from the VA, DOD, NIH, and the State of Colorado, editorial remuneration from Wolters Kluwer and the Rand Corporation, and royalties from the American Psychological Association and Oxford University Press. In addition, L.A.B. consults with sports leagues via her university affiliation.

Data availability

The data underlying this article cannot be shared publicly because it is protected patient data.

References

  • 1.Web-based Injury Statistics Query and Reporting System (WISQARS). Office of Statistics and Programming, National Center for Injury Prevention and Control, CDC. National Center for Health Statistics (NCHS), National Vital Statistics System. http://www.cdc.gov/injury/wisqars/index.html
  • 2. U.S. Department of Veterans Affairs, Office of Mental Health and Suicide Prevention. 2023 National Veteran Suicide Prevention Annual Report. 2023. Accessed February 9, 2024. https://www.mentalhealth.va.gov/docs/data-sheets/2023/2023-National-Veteran-Suicide-Prevention-Annual-Report-FINAL-508.pdf
  • 3. Anglemyer A, Horvath T, Rutherford G.  The accessibility of firearms and risk for suicide and homicide victimization among household members: a systematic review and meta-analysis. Ann Intern Med. 2014;160(2):101-110. [DOI] [PubMed] [Google Scholar]
  • 4. Miller M, Azrael D, Hemenway D.  Firearm availability and unintentional firearm deaths. Accid Anal Prev. 2001;33(4):477-484. [DOI] [PubMed] [Google Scholar]
  • 5. Miller M, Swanson SA, Azrael D.  Are we missing something pertinent? A bias analysis of unmeasured confounding in the firearm-suicide literature. Epidemiol Rev. 2016;38(1):62-69. [DOI] [PubMed] [Google Scholar]
  • 6. National Academies of Sciences, Engineering, and Medicine; Health and Medicine Division ; Board on Population Health and Public Health Practice. In Wojtowicz A, French M, Alper J, eds. Health Systems Interventions to Prevent Firearm Injuries and Death: Proceedings of a Workshop. National Academies Press; (US); February 28, 2019. https://www.ncbi.nlm.nih.gov/books/NBK540809/ https://doi.org/10.17226/25354 [PubMed] [Google Scholar]
  • 7. Ahmedani BK, Simon GE, Stewart C, et al.  Health care contacts in the year before suicide death. J Gen Intern Med. 2014;29(6):870-877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Studdert DM, Zhang Y, Swanson SA, et al.  Handgun ownership and suicide in California. N Engl J Med. 2020;382(23):2220-2229. [DOI] [PubMed] [Google Scholar]
  • 9. Laguado S, Steavenson R, Mehvar M.  Areas of improvement in suicide risk identification, assessment, and risk mitigation documentation by mental health prescribers at a veterans affairs health care system. Adm Policy Ment Health. 2021;48(4):633-638. [DOI] [PubMed] [Google Scholar]
  • 10. Boggs JM, Richards J, Simon G, et al.  Suicide screening, risk assessment, and lethal means counseling during zero suicide implementation. Psychiatr Serv. 2024;appips20230211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Richards JE, Kuo ES, Whiteside U, et al.  Patient and clinician perspectives of a standardized question about firearm access to support suicide prevention: a qualitative study. JAMA Health Forum. 2022;3(11):e224252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Boggs JM, Quintana LM, Powers JD, Hochberg S, Beck A.  Frequency of clinicians’ assessments for access to lethal means in persons at risk for suicide. Arch Suicide Res. 2022;26(1):127-136. [DOI] [PubMed] [Google Scholar]
  • 13. Brandt CA, Workman TE, Farmer MM, et al.  Documentation of screening for firearm access by healthcare providers in the veterans healthcare system: a retrospective study. West J Emerg Med. 2021;22(3):525-532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Cleveland EC, Azrael D, Simonetti JA, Miller M.  Firearm ownership among American veterans: findings from the 2015 National Firearm Survey. Inj Epidemiol. 2017;4(1):33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Simonetti JA, Azrael D, Zhang W, Miller M.  Receipt of clinician-delivered firearm safety counseling among U.S. Veterans: results from a 2019 national survey. Suicide Life Threat Behav. 2022;52(6):1121-1125. [DOI] [PubMed] [Google Scholar]
  • 16. Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. Published online January 4, 2019. 10.48550/arXiv.1711.05101 [DOI]
  • 17. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Stanford, CA: Stanford University; 2009. https://web.stanford.edu/∼hastie/Papers/ESLII.pdf; 10.1007/978-0-387-84858-7 [DOI] [Google Scholar]
  • 18. Marafino BJ, Park M, Davies JM, et al.  Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data. JAMA network open.  2018;1(8):e185097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Lee J, Yoon W, Kim S, et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (Oxford, England). 2020;36(4):1234-1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.ClinicalBERT - Bio + Clinical BERT Model. Accessed May 31 2024. https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT
  • 21. Bergstra J, Bengio Y.  Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13(null):281-305. [Google Scholar]
  • 22. DiCiccio TJ, Efron B.  Bootstrap Confidence Intervals. Statistical Science. 1996;11(3):189-212. [Google Scholar]
  • 23.Pedregosa F, Varoquaux G, Gramfort A, et al.  Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825-2830. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocae169_Supplementary_Data

Data Availability Statement

The data underlying this article cannot be shared publicly because it is protected patient data.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES