Skip to main content
PLOS One logoLink to PLOS One
. 2021 Jul 16;16(7):e0254417. doi: 10.1371/journal.pone.0254417

Not discussed: Inequalities in narrative text data for suicide deaths in the National Violent Death Reporting System

Briana Mezuk 1,2,*, Viktoryia A Kalesnikava 1, Jenni Kim 3, Tomohiro M Ko 1,4, Cassady Collins 1
Editor: Ellen L Idler5
PMCID: PMC8284808  PMID: 34270588

Abstract

Background

The rate of suicide in the US has increased substantially in the past two decades, and new insights are needed to support prevention efforts. The National Violent Death Reporting System (NVDRS), the nation’s most comprehensive registry of suicide mortality, has qualitative text narratives that describe salient circumstances of these deaths. These texts have great potential for providing novel insights about suicide risk but may be subject to information bias.

Objective

To examine the relationship between decedent characteristics and the presence and length of NVDRS text narratives (separately for coroner/medical examiner (C/ME) and law enforcement (LE) reports) among 233,108 suicide and undetermined deaths from 2003–2017.

Methods

Generalized estimating equations (GEE) logistic and quasi-Poisson modeling was used to examine variation in the narratives (proportion of missing texts and character length of the non-missing texts, respectively) as a function of decedent age, sex, race/ethnicity, education, marital status, military history, and homeless status. Models adjusted for site, year, location of death, and autopsy status.

Results

The frequency of missing narratives was higher for LE vs. C/ME texts (19.8% vs. 5.2%). Decedent characteristics were not consistently associated with missing text across the two types of narratives (i.e., Black decedents were more likely to be missing the LE narrative but less likely to be missing the C/ME narrative relative to non-Hispanic whites). Conditional on having a narrative, C/ME were significantly longer than LE (822.44 vs. 780.68 characters). Decedents who were older, male, had less education and some racial/ethnic minority groups had shorter narratives (both C/ME and LE) than younger, female, more educated, and non-Hispanic white decedents.

Conclusion

Decedent characteristics are significantly related to the presence and length of narrative texts for suicide and undetermined deaths in the NVDRS. Findings can inform future research using these data to identify novel determinants of suicide mortality.

Introduction

In the US, the rate of suicide has increased by more than a third since 1999 [1], despite ongoing and renewed efforts by governmental and non-governmental stakeholders to support research on developing more effective prevention measures [25]. Leaders in the field have argued:

“By and large, the [suicidal thoughts and behaviors (STB)] risk factor field appears to have conducted essentially the same studies over and over again throughout the last 50 years. In light of this pattern, it is not surprising that predictive ability has remained nearly constant over the last 50 years”

[6].

This critique calls for new conceptual models, data sources and analytic approaches to understanding suicidal behavior, with attention to identifying modifiable determinants over the life course.

The National Violent Death Reporting System (NVDRS) is a state-based mortality registry implemented by the CDC that seeks to link “information about the “who, when, where, and how” from data on violent deaths [suicide, homicide, accidental firearm] and provides insights about “why” they occurred” [7, 8]. It is the most comprehensive surveillance system of the circumstances surrounding suicide mortality in the US, and it has recently been expanded to cover all 50 states [9]. The rationale for this rich data source is to enhance investigations that seek to clarify the circumstances and help discern contributing factors for completed suicide. Such understanding is a critical tool in improving prevention efforts at the population scale [10].

A unique feature of the NVDRS, distinct from other mortality registries, is that most cases are accompanied by a textual “narrative” abstracted by NVDRS staff using original source documents including death scene investigations, interviews with people who knew the decedent, contents of suicide notes, autopsy reports, and related sources [8]. Each case in the registry has multiple narratives: one is primarily derived from coroner or medical examiner reports and a second is primarily derived from law enforcement investigations. These narratives thus provide qualitative textual evidence on a population scale. Previously, qualitative text data in suicide research was generally limited to small psychological autopsy studies [11] or interviews with people who had survived a suicide attempt [12]. However, a handful of studies have begun using these NVDRS text data, some leveraging analytic tools appropriate for manipulating large amounts of text such as natural language processing (NLP) algorithms [13] but most applying traditional qualitative approaches (i.e., content analysis) to smaller subsets of the registry [1418].

Regardless of the analytic approach used, any effort to draw inferences from the NVDRS narratives need to be made with a careful consideration of potential biases and limitations in data collection and measurement. From a data quality perspective, the NVDRS texts are unique, as they are explicitly written for research purposes by centrally-trained staff. NVDRS staff undergo regular training to enhance consistency of abstraction, and state data are reviewed centrally by CDC staff before they are made available to external investigators [19, 20]. However, these narratives may still be subject to measurement error which could bias inferences [21]. For example, if there are systematic patterns in the amount or quality of text written about each case as a function of decedent characteristics (e.g., age or race), this information bias would impact the validity of any conclusions drawn about how suicide mortality varies over the life course or how established risk factors for suicide (e.g., depression, substance misuse) relate to racial differences in suicide risk, respectively. Investigators need to understand the strengths and limitations of these narrative texts to appropriately account for any such sources of bias in their empirical research.

We aim to further scientific conversation about and harness the NVDRS’s utility as a tool for informing suicide prevention efforts. Therefore, we investigated the relationship between decedent characteristics and length of NVDRS text narratives from nearly 240,000 suicide and undetermined deaths from 2003 to 2017. The length of the narrative is used to proxy the information potential of the text [22]. These findings can inform the work of investigators in their efforts to identify novel risk and protective factors for suicide.

Methods

Data source and elements

The NVDRS registry is publicly available through the CDC’s Web-based Injury Statistics Query and Reporting System (WISQARS) [23]; however, the text narrative elements are only available to external investigators through a restricted-access data use agreement. We obtained NVDRS Restricted-Access Data (RAD) from the CDC in May 2020 using their application procedures [8, 10]. This dataset consisted of 239,716 deaths of all ages from suicide (including multiple suicides, and homicide followed by suicide), accidental firearm, and undetermined cause from 37 NVDRS sites (AK, AZ, CA, CO, CT, DE, DC, GA, GI, IL, IN, IA, KN, KY, ME, MD, MA, MI, MN, NV, NH, NJ, NM, NY, NC, OH, OK, OR, PA, RI, SC, UT, VT, VA, WA, WV, and WI), as well as Puerto Rico, from 2003 to 2017.

All data in the NVDRS registry, both quantitative variables and qualitative text narratives, are coded and written, respectively, by trained abstractors in each participating state [8, 10, 20, 24]. All data are generated using original source documents (death certificates, coroner or medical examiner reports, witness statements, law enforcement reports, scene investigations, etc.). These documents are converted into quantitative variables and qualitative text narratives using a common data entry system. The CDC provides centralized training for state abstractors, reviews the submitted data before it is released to external investigators, and has a set of quality assurance procedures to support reliable abstraction of documents across states and over time [19, 25].

Qualitative text narratives

This analysis used two types of narratives for each decedent: one primarily derived from coroner and medical examiner reports (C/ME), and one primarily derived from law enforcement investigations (LE). While these are both written by NVDRS staff and therefore should have similar information, we examined each type separately to assess the degree to which any patterns we observe regarding decedent characteristics are similar across the two texts. If the patterns are similar, this may reflect features of the centralized NVDRS system or general limitations in the accuracy and completeness of mortality documentation (i.e., lack of access to specific records by NVDRS staff, incomplete death certificates) [26]. If the patterns differ, this may reflect characteristics of the source documents (e.g., toxicology reports, police reports) or reporting procedures. For example, not all decedents undergo autopsy, and states vary in whether they have local or centralized coroner and/or medical examiner systems, both of which would primarily influence the C/ME narratives. In addition, while the overwhelming majority of deaths are investigated by local, rather than state or federal, law enforcement agencies, most NVDRS sites do not have a pre-existing information-sharing infrastructure that would enable the seamless transfer of source documents between these police departments and the state NVDRS abstractors [25]. The net result is that NVDRS staff often must foster relationships with local stakeholders that create the source documents used for data abstraction (i.e., coroners, police departments) to ensure complete reporting. This may introduce systematic state and chronological differences in the completeness and length of the narratives as NVDRS staff foster and build these partnerships over time.

Inclusion criteria

Exploratory analyses confirmed that narratives for multiple deaths (i.e., multiple suicides, homicide followed by suicide) were longer than those of single deaths, and therefore these cases were excluded from analysis (n = 4,361). Because our analysis is focused on suicide, accidental firearm deaths were also excluded (n = 2,247). Undetermined cause deaths were retained in the analysis to reflect potential misclassification of suicide [27, 28]. As illustrated by Fig 1, after these exclusions the analytic sample size was n = 233,108, which consisted of single suicide deaths (n = 195,343) and undetermined deaths (n = 37,765). This was the sample used in Analysis 1, which examined predictors of whether the decedent was missing a text narrative.

Fig 1. Flowchart of sample inclusion/exclusion criteria for analyses of narrative texts, National Violent Death Reporting System, 2003–2017.

Fig 1

Analysis 2 examined predictors of the length (in characters, including spaces) of the narrative. For this analysis, the sample was additionally limited to those cases in which the NVDRS coders indicated that “circumstances were known,” as the intent of the narrative is to provide a detailed description of these circumstances and the Data Users Guide specifies this condition should be applied. This resulted in the exclusion of an additional n = 27,317 cases. Through additional exploratory analyses we noted that there were several cases where the NVDRS data indicated that circumstances were “not known,” but the case still had a narrative of at least 31 characters in length. A description and examples of these narratives are provided in the S1 Appendix. This means the results of the analysis of narrative length presented here are likely conservative. Also, in the S1 Appendix we provide a random sample of 10 annotated examples of short (31 to <200 characters) and long (>500 characters) narratives, to illustrate the notion that longer texts have more information potential.

This project was approved by the CDC-NVDRS, and this analysis was deemed exempt from human subjects regulation by the Institutional Review Board at the University of Michigan.

Data access

The narrative data used in this analysis are available by request from the CDC through their restricted-access data process. Other NVDRS data are publicly-available: https://www.cdc.gov/violenceprevention/datasources/nvdrs/datapublications.html. Cells with <5 observations have been suppressed in this publication, as required by the NVDRS Data Use Agreement.

Predictors

The quantitative variables used in the regression analyses focused on seven decedent characteristics that are mandated on standard US Death Certificates [29]: age (coded as ≤18, 19–29, 30–39, 40–49, 50–59, 60–69, 70–79 and ≥80 years); sex (coded as female, male, or unknown); race/ethnicity (coded as American Indian/Alaskan Native, Asian/Pacific Islander, Black/African American, Hispanic, Non-Hispanic White, Two or more races, Other, or unknown); marital status (coded as married/civil union/domestic partnership, separated/divorced, widowed, never married/single but not otherwise specified, or unknown); educational attainment (coded as 8th grade or less, 9th to 12th grade, high school diploma or GED, some college but no degree, associate’s degree, bachelor’s degree, master’s degree, doctorate/professional degree, or unknown); military status (yes, no or unknown); and homeless (yes, no or unknown). In addition, the regression models adjusted for whether an autopsy was performed (yes, no, or unknown); location of death (home, hospital, hospice/nursing home, other, or unknown); and year (2017 as the reference). These additional variables were included because exploratory analyses indicated they improved both absolute and relative model fit. Because the amount of missing data in these predictor variables was generally limited (see Table 1), we included a dummy code for “missing” for all predictors so that these observations were retained in the regression analyses. The exception to this was for education level, which had substantial amounts of missingness; therefore, for this variable we conducted an additional analysis accounting for missing values using imputation with multivariate chain equations (30 datasets, 20 iterations). For all analyses, NVDRS site, which is an identification variable that reflects which state abstracted a particular case, was used as a clustering variable in the regression analyses, as described below.

Table 1. Decedent characteristics stratified by narrative missing status: Suicide and undetermined deaths in the National Violent Death Reporting System, 2003–2017.

C/ME Narratives LE Narratives
Total Not Missing Missing Not Missing Missing
N = 233108 N = 221046 N = 12062 N = 186935 N = 46173
Age (median [Q1; Q3]) 45 [16; 83] 45 [16; 83] 47 [16; 84] 45 [16; 83] 46 [15; 84]
Sex
 Female 57346 (24.6%) 54629 (24.7%) 2717 (22.5%) 45051 (24.1%) 12295 (26.6%)
 Male 175673 (75.4%) 166381 (75.3%) 9292 (77.0%) 141860 (75.9%) 33813 (73.2%)
 Unknown/Missing 89 (0.04%) 36 (<0.1%) 53 (0.4%) 24 (<0.1%) 65 (0.1%)
Race/Ethnicity
 White, non-Hispanic 191214 (82.0%) 181296 (82.0%) 9918 (82.2%) 154748 (82.8%) 36466 (79.0%)
 American Indian/Alaska Native 3122 (1.34%) 2855 (1.3%) 267 (2.2%) 2470 (1.3%) 652 (1.4%)
 Asian/Pacific Islander 3855 (1.65%) 3730 (1.7%) 125 (1.0%) 2966 (1.6%) 889 (1.9%)
 Black or African American 18280 (7.84%) 17528 (7.9%) 752 (6.2%) 13895 (7.4%) 4385 (9.5%)
 Hispanic 12289 (5.27%) 11786 (5.3%) 503 (4.2%) 9528 (5.1%) 2761 (6.0%)
 Other/Unspecified, non-Hispanic 772 (0.33%) 519 (0.2%) 253 (2.1%) 395 (0.2%) 377 (0.8%)
 Two or more races, non-Hispanic 3268 (1.40%) 3169 (1.4%) 99 (0.8%) 2830 (1.5%) 438 (0.9%)
 Unknown/Missing 308 (0.13%) 163 (0.1%) 145 (1.2%) 103 (0.1%) 205 (0.4%)
Education Level
 8th grade or less 9515 (4.08%) 8935 (4.0%) 580 (4.8%) 7259 (3.9%) 2256 (4.9%)
 9-12th grade, no diploma 24624 (10.6%) 23266 (10.5%) 1358 (11.3%) 20248 (10.8%) 4376 (9.5%)
 HS or GED 67738 (29.1%) 64203 (29.0%) 3535 (29.3%) 55200 (29.5%) 12538 (27.2%)
 Some college, no degree 26245 (11.3%) 25069 (11.3%) 1176 (9.7%) 21634 (11.6%) 4611 (10.0%)
 Associate degree 11708 (5.02%) 11107 (5.0%) 601 (5.0%) 9563 (5.1%) 2145 (4.6%)
 Bachelor’s degree 17330 (7.43%) 16647 (7.5%) 683 (5.7%) 14177 (7.6%) 3153 (6.8%)
 Master’s degree 5930 (2.54%) 5668 (2.6%) 262 (2.2%) 4809 (2.6%) 1121 (2.4%)
 Professional or Doctorate degree 2677 (1.15%) 2577 (1.2%) 100 (0.8%) 2199 (1.2%) 478 (1.0%)
 Unknown/Missing 67341 (28.9%) 63574 (28.8%) 3767 (31.2%) 51846 (27.7%) 15495 (33.6%)
Marital Status
 Married/In relationship 74183 (31.8%) 70101 (31.7%) 4082 (33.8%) 59275 (31.7%) 14908 (32.3%)
 Divorced/Separated 55480 (23.8%) 52718 (23.8%) 2762 (22.9%) 44955 (24.0%) 10525 (22.8%)
 Single/Never Married 87033 (37.3%) 83150 (37.6%) 3883 (32.2%) 70259 (37.6%) 16774 (36.3%)
 Widowed 12832 (5.50%) 12071 (5.5%) 761 (6.3%) 10041 (5.4%) 2791 (6.0%)
 Unknown/Missing 3580 (1.54%) 3006 (1.4%) 574 (4.8%) 2405 (1.3%) 1175 (2.5%)
Military
 No 177732 (76.2%) 169255 (76.6%) 8477 (70.3%) 143528 (76.8%) 34204 (74.1%)
 Yes 37527 (16.1%) 35396 (16.0%) 2131 (17.7%) 30638 (16.4%) 6889 (14.9%)
 Unknown/Missing 17849 (7.66%) 16395 (7.4%) 1454 (12.1%) 12769 (6.8%) 5080 (11.0%)
Homeless
 No 218557 (93.8%) 211183 (95.5%) 7374 (61.1%) 179221 (95.9%) 39336 (85.2%)
 Yes 3083 (1.32%) 3023 (1.4%) 60 (0.5%) 2601 (1.4%) 482 (1.0%)
 Unknown/Missing 11468 (4.92%) 6840 (3.1%) 4628 (38.4%) 5113 (2.7%) 6355 (13.8%)
Autopsy Performed
 No 97505 (41.8%) 91110 (41.2%) 6395 (53.0%) 77869 (41.7%) 19636 (42.5%)
 Yes 133969 (57.5%) 128842 (58.3%) 5127 (42.5%) 108168 (57.9%) 25801 (55.9%)
 Unknown/Missing 1634 (0.70%) 1094 (0.5%) 540 (4.5%) 898 (0.5%) 736 (1.6%)
Place of Death
 Home 128517 (55.1%) 122725 (55.5%) 5792 (48.0%) 107318 (57.4%) 21199 (45.9%)
 Hospice or LTC 2117 (0.91%) 1898 (0.9%) 219 (1.8%) 893 (0.5%) 1224 (2.7%)
 Hospital 40215 (17.3%) 37988 (17.2%) 2227 (18.5%) 29165 (15.6%) 11050 (23.9%)
 Other 60327 (25.9%) 57511 (26.0%) 2816 (23.3%) 48691 (26.0%) 11636 (25.2%)
 Unknown/Missing 1932 (0.83%) 924 (0.4%) 1008 (8.4%) 868 (0.5%) 1064 (2.3%)
Circumstances known
 No 27317 (11.7%) 19816 (9.0%) 7501 (62.2%) 13316 (7.1%) 14001 (30.3%)
 Yes 205791 (88.3%) 201230 (91.0%) 4561 (37.8%) 173619 (92.9%) 32172 (69.7%)

Analysis

We examined how the (i) percent of missing narratives and (ii) text character length among those with a non-missing narrative, for both C/ME and LE texts, varied as a function of decedent characteristics.

Analysis 1: Predictors of missing narratives

We conducted extensive exploratory analysis of the text narratives focused on the length of the C/ME and LE texts. While in most cases the narrative was simply missing (zero characters), in other cases the only text provided was “Not available,” “No report at this time,” or “N/A’’ which are, in effect, missing values, as these texts were not describing salient characteristics that would be of interest to researchers. Therefore, we recoded all narratives with fewer than 31 characters (including spaces) to zero characters for analysis. After this recoding, there were 12,062 (5.2%) C/ME and 46,173 (19.8%) LE narratives treated as “missing” in the subsequent analysis; 6,170 observations (3%) were missing both C/ME and LE narratives. We then fit two logistic regression models (modeling C/ME and LE separately), to identify predictors of having a missing narrative (1 = missing, 0 = not missing), controlling for year, location of death, and autopsy status. There was significant clustering of the outcomes by site (intraclass correlation coefficient (ICC) for a missing narrative: C/ME = 0.57, LE = 0.48; ICC for narrative length: C/ME = 0.35, LE = 0.43). Therefore, we accounted for the clustering of observations within sites using Generalized Estimating Equations (GEE) modeling assuming an exchangeable correlation structure and a sandwich estimator to be robust against model misspecification [30]. GEE accounts for factors that cluster within sites (e.g., state demographic composition, C/ME system (centralized vs. local), abstracter experience). We also conducted a sensitivity analysis excluding sites with <5 observations missing a narrative (i.e., sites with nearly complete narrative data) to confirm that our analysis of missingness was not influenced by these sites.

Analysis 2: Predictors of the length of the narratives among cases whose narrative was not missing

The second analysis examined the predictors of the length of the C/ME and LE texts, as expressed by the count of characters (including spaces), conditional on having a non-missing narrative and having “known circumstances.” The condition of “known circumstances” was applied as directed in the RAD Data User Guide and resulted in 27,317 cases excluded from this analysis (Fig 1). We used GEE quasi-Poisson models, with an exchangeable correlation structure and sandwich estimator, to examine the relationship between decedent characteristics and the length of the narratives while controlling for year, location of death, and autopsy status, separately for C/ME and LE narratives. The quasi-Poisson model is appropriate for outcomes that are discrete integers (i.e., count of character length) and are over-dispersed (i.e., variance greater than the mean) [31], as is the case in the present analysis. We also conducted a sensitivity analysis by excluding observations in the top 1% of character length (separately for C/ME and LE) to confirm that our analysis of length was not influenced by these outlier observations.

Finally, we conducted two additional post-hoc sensitivity analyses for both the missing narratives and narrative length to confirm that the robustness of our findings: (1) we additionally adjusted for presence of a toxicology report (coded yes vs. no/not applicable), which may result in longer narratives due to the description of substances, and (2) we re-ran all models excluding 30,094 undetermined deaths (that is, limiting the analysis to single-death suicide cases).

All analyses were conducted using R (version 4.0.2) and all p-values refer to two-tailed tests.

Results

Analysis 1: Predictors of missing narratives

Table 1 shows decedent characteristics of the overall analytic sample and stratified by whether their C/ME or LE narrative was missing. The sample was predominantly male and non-Hispanic white (NHW), with a median age of 46. Unsurprisingly, decedents whose characteristics were “unknown” were more likely to be missing narratives than those with valid data. However, even among decedents with known demographics there was variation in the number of missing narratives, although this variation was not always consistent across the two types of texts.

As shown by Fig 2 and S1 Table, after accounting for year, place of death, and autopsy status, there was a dose-response relationship between older age and relative odds of having a missing an LE, but not C/ME, narrative. Women were more likely to be missing LE (Odd ratio (OR): 1.12, 95% CI: 1.09–1.152), but not C/ME (OR: 1.01), narratives relative to men. Decedents who were Native American/Alaskan Native were more likely to be missing both C/ME (OR: 2.30, 95% CI: 1.79–2.95) and LE (OR: 1.65, 95% CI: 1.42–1.92) narratives relative to NHW, while decedents who were Asian/Pacific Islander, Black, or Hispanic were more likely to be missing LE narratives but less likely to be missing C/ME narratives relative to NHW. Decedents with more education were consistently less likely to have missing narratives (e.g., ORDoctorate vs. HS: 0.65, 95% CI: 0.48–0.88 for C/ME). Marital status and military history were not associated with missingness. As shown by S2 Table the results of the sensitivity analysis excluding sites with <5 observations missing a narrative (i.e., nearly complete narrative data) were consistent with the main results.

Fig 2. Forest plot of relative odds (95% confidence intervals) of missing C/ME and LE narrative texts associated with decedent characteristics, NVDRS 2003–2017.

Fig 2

Estimates are adjusted for all variables show in the figure as well as year, location of death, and autopsy status and account for clustering within site using GEE with robust standard errors.

Analysis 2: Predictors of the length of narratives

Table 2 shows decedent characteristics as a function of narrative count length, which for ease of interpretation is stratified into tertiles, among those with “known” circumstances.

Table 2. Decedent characteristics stratified by narrative length: Suicide and undetermined deaths in the National Violent Death Reporting System, 2003–2017.

C/ME Narratives (character length) LE Narratives (character length)
Short: 31–396 Medium: 397–659 Long: 660–9961 Short: 31–402 Medium: 403–731 Long: 732–9985
N = 67119 N = 67193 N = 66918 N = 57974 N = 57825 N = 57820
Age (Median [Q1; Q3] 45.0 [17.0;83.0] 46.0 [17.0;83.0] 45.0 [16.0;82.0] 46.0 [17.0;84.0] 46.0 [17.0;83.0] 44.0 [16.0;82.0]
Sex
 Female 14721 (21.9%) 16672 (24.8%) 18937 (28.3%) 13716 (23.7%) 13606 (23.5%) 15040 (26.0%)
 Male 52398 (78.1%) 50521 (75.2%) 47980 (71.7%) 44258 (76.3%) 44219 (76.5%) 42779 (74.0%)
 Unknown/Missing 0 (0.0%) 0 (0.0%) 1 (<0.1%) 0 (0.0%) 0 (0.0%) 1 (<0.1%)
Race/Ethnicity
 White 55677 (83.0%) 56071 (83.4%) 55252 (82.6%) 47565 (82.0%) 48854 (84.5%) 48614 (84.1%)
 American Indian/Alaska Native 482 (0.7%) 782 (1.2%) 1230 (1.8%) 497 (0.9%) 658 (1.1%) 1042 (1.8%)
 Asian/Pacific Islander 997 (1.5%) 1108 (1.6%) 1234 (1.8%) 911 (1.6%) 815 (1.4%) 1007 (1.7%)
 Black or African American 5780 (8.6%) 4851 (7.2%) 4023 (6.0%) 5355 (9.2%) 4027 (7.0%) 2638 (4.6%)
 Hispanic 3059 (4.6%) 3266 (4.9%) 4141 (6.2%) 2622 (4.5%) 2515 (4.3%) 3590 (6.2%)
 Other/Unspecified, non-Hispanic 112 (0.2%) 130 (0.2%) 144 (0.2%) 120 (0.2%) 86 (0.1%) 107 (0.2%)
 Two or more races, non-Hispanic 970 (1.4%) 966 (1.4%) 877 (1.3%) 876 (1.5%) 857 (1.5%) 809 (1.4%)
 Unknown/Missing 42 (0.1%) 19 (<0.1%) 17 (<0.1%) 28 (<0.1%) 13 (<0.1%) 13 (<0.1%)
Education Level
 8th grade or less 2449 (3.6%) 1988 (3.0%) 2134 (3.2%) 2167 (3.7%) 1755 (3.0%) 1679 (2.9%)
 9-12th grade, no diploma 6895 (10.3%) 6577 (9.8%) 7406 (11.1%) 6407 (11.1%) 6035 (10.4%) 5910 (10.2%)
 HS or GED 16687 (24.9%) 19852 (29.5%) 22417 (33.5%) 15448 (26.6%) 17871 (30.9%) 18372 (31.8%)
 Some college, no degree 5820 (8.7%) 7643 (11.4%) 9845 (14.7%) 5713 (9.9%) 6906 (11.9%) 7861 (13.6%)
 Associate degree 2640 (3.9%) 3148 (4.7%) 4495 (6.7%) 2268 (3.9%) 2895 (5.0%) 3861 (6.7%)
 Bachelor’s degree 4076 (6.1%) 5017 (7.5%) 6426 (9.6%) 3597 (6.2%) 4415 (7.6%) 5438 (9.4%)
 Master’s degree 1336 (2.0%) 1772 (2.6%) 2215 (3.3%) 1266 (2.2%) 1448 (2.5%) 1870 (3.2%)
 Professional or Doctorate degree 641 (1.0%) 774 (1.2%) 986 (1.5%) 631 (1.1%) 630 (1.1%) 814 (1.4%)
 Unknown/Missing 26575 (39.6%) 20422 (30.4%) 10994 (16.4%) 20477 (35.3%) 15870 (27.4%) 12015 (20.8%)
Marital Status
  Married/In relationship 22076 (32.9%) 22254 (33.1%) 20452 (30.6%) 18628 (32.1%) 18721 (32.4%) 18436 (31.9%)
 Divorced/Separated 15828 (23.6%) 16158 (24.0%) 16989 (25.4%) 14332 (24.7%) 14009 (24.2%) 14071 (24.3%)
 Single/Never Married 24418 (36.4%) 24307 (36.2%) 25347 (37.9%) 20676 (35.7%) 21226 (36.7%) 22211 (38.4%)
 Widowed 3862 (5.8%) 3765 (5.6%) 3421 (5.1%) 3527 (6.1%) 3232 (5.6%) 2618 (4.5%)
 Unknown/Missing 935 (1.4%) 709 (1.1%) 709 (1.1%) 811 (1.4%) 637 (1.1%) 484 (0.8%)
Military
 No 48265 (71.9%) 52402 (78.0%) 54355 (81.2%) 42370 (73.1%) 44656 (77.2%) 47069 (81.4%)
 Yes 11374 (16.9%) 11044 (16.4%) 10204 (15.2%) 10056 (17.3%) 9668 (16.7%) 8993 (15.6%)
 Unknown/Missing 7480 (11.1%) 3747 (5.6%) 2359 (3.5%) 5548 (9.6%) 3501 (6.1%) 1758 (3.0%)
Homeless
 No 64341 (95.9%) 65109 (96.9%) 64715 (96.7%) 55499 (95.7%) 56074 (97.0%) 56286 (97.3%)
 Yes 710 (1.1%) 835 (1.2%) 1169 (1.7%) 791 (1.4%) 760 (1.3%) 819 (1.4%)
 Unknown/Missing 2068 (3.1%) 1249 (1.9%) 1034 (1.5%) 1684 (2.9%) 991 (1.7%) 715 (1.2%)
Autopsy Performed
 No 26061 (38.8%) 29824 (44.4%) 28333 (42.3%) 22426 (38.7%) 26367 (45.6%) 24310 (42.0%)
 Yes 40682 (60.6%) 37152 (55.3%) 38382 (57.4%) 35240 (60.8%) 31232 (54.0%) 33322 (57.6%)
 Unknown/Missing 376 (0.6%) 217 (0.3%) 203 (0.3%) 308 (0.5%) 226 (0.4%) 188 (0.3%)
Place of Death
 Home 36826 (54.9%) 38361 (57.1%) 39718 (59.4%) 32178 (55.5%) 34270 (59.3%) 35380 (61.2%)
 Hospice or LTC 830 (1.2%) 407 (0.6%) 384 (0.6%) 340 (0.6%) 247 (0.4%) 213 (0.4%)
 Hospital 11063 (16.5%) 11632 (17.3%) 10561 (15.8%) 9476 (16.3%) 8470 (14.6%) 8298 (14.4%)
 Other 18067 (26.9%) 16565 (24.7%) 16128 (24.1%) 15769 (27.2%) 14560 (25.2%) 13761 (23.8%)
 Unknown/Missing 333 (0.5%) 228 (0.3%) 127 (0.2%) 211 (0.4%) 278 (0.5%) 168 (0.3%)

Fig 3 and S3 Table show the results of the quasi-Poisson regression models, adjusted for site, year, place of death, and autopsy status. The estimates reflect the relative ratio (RR) of mean character counts. Older age was consistently associated with shorter narratives, as was being Black (RRCME: 0.94, 95% CI: 0.93–0.95), or Asian/Pacific Islander (RRCME: 0.97, 95% CI: 0.95–0.99) race relative to NHW and being single relative to being married (RRCME: 0.98, 95% CI: 0.98–0.99). Females (RRLE = 1.05, 95% CI: 1.04–1.05) and those with more education had longer narratives (e.g., RRDoctorate vs. HS: 1.05, 95% CI: 1.02–1.07 for C/ME). As shown by S4 Table the results of the sensitivity analysis excluding the longest outlier narratives were consistent with the main results.

Fig 3. Forest plot of relative ratios (95% confidence intervals) of the mean length of C/ME and LE narrative texts associated with decedent characteristics, NVDRS 2003–2017.

Fig 3

Estimates are adjusted for all variables show in the figure as well as year, location of death, and autopsy status and account for clustering within site using GEE with robust standard errors.

S5S8 Tables show the results of additional sensitivity analyses for missing CME and LE narratives (S5 and S6 Tables, respectively) and CME and LE narrative length (S7 and S8 Tables, respectively). Model 1 of these tables reprints our main analyses for ease of comparison. Model 2 shows estimates using imputed education level instead of dummy-coded missing status; the findings are largely unchanged using this imputed education variable, even if some point estimates are no longer statistically significant: higher education is inversely associated with the narrative being missing, particularly for the CME narratives, and, conditional on having a non-missing narrative, higher education is associated with longer texts for both CME and LE narratives. Model 3 provides the results from sensitivity analysis excluding all cases of undetermined cause of death and shows that findings were substantially unchanged from our main analysis. Finally, additionally adjusting for presence of a toxicology report (Model 4) had no substantive impact on our findings.

Discussion

Decedent characteristics are significantly related to the presence and length of narrative texts for suicide and undetermined deaths in the NVDRS, even after accounting for variation across sites, length of time the site had been participating in this surveillance system, and characteristics of the death event (i.e., location of death, autopsy status). To our knowledge this is the first study to comprehensively examine how decedent characteristics relate to the quantity of narrative data in this registry. We found that even after accounting for differences across sites and post-mortem factors, decedents who were older, racial/ethnic minority, and had less education were more likely to have missing narrative texts. Further, even among those with a narrative, these characteristics were also predictive of shorter texts. These findings extend prior research in this registry that has examined how decedent characteristics relate to classification of cause of death (i.e., suicide vs. undetermined) [32] and factors that relate to the completeness of these data within specific states [33]. While this study cannot determine why narrative length varies as a function of these characteristics, this variation has implications for studies that seek to leverage these data to understand salient factors for suicide risk both within and across groups.

This study also identified several system-level factors associated with the presence and length of the narratives which researchers should be aware of when using these texts to investigate suicide mortality. LE narratives were more likely to be missing than C/ME ones, and prior work has shown is more challenging for state NVDRS staff to collate reports from decentralized law enforcement systems [25, 33]. Conditional on having a narrative, C/ME narratives were substantially longer than LE texts, which may indicate they have more information potential for researchers seeking to identify novel risk factors. Sites that were newer to the NVDRS generated shorter narratives than those who had been in the system longer, potentially reflecting relative inexperience with writing these narratives or less established relationships with stakeholders (i.e., local law enforcement agencies) who provide the original source materials to the state NVDRS to abstract for the texts. Finally, while not part of the RAD that external researchers can access, there may be data processing variables that are created as part of the NVDRS abstraction process that internal staff could use to identify the specific reasons why a particular narrative is missing (e.g., indicators that the incident report needed follow-up; the specific document source; whether or not the document was available to the coder), which the CDC could use to identify system-level factors that contribute to data (in)completeness.

Suicide risk (attempts and mortality) has increased for the entire US over the past 20 years, particularly among Black adolescents [34] and middle-aged (age 45–64) adults [35]. Efforts to understand how these demographic characteristics intersect with known risk factors for suicidal behavior (i.e., depression, substance misuse, pain, loneliness, functional limitations, major life events), or, more importantly, to identify how these characteristics relate to modifiable protective factors, requires high-quality data at a population-scale. The NVDRS narratives are an important resource for researchers and policy makers as they seek to inform and implement evidence-based programs to reduce suicide risk, particularly to identify novel risk factors. For example, researchers have used the narrative texts to identify suicides related to transitioning into long-term care [13], intimate partner violence [15], risk factors among military personnel [17], and how multiple risk factors interact for middle-age men and women [14]. Such efforts are needed to address the stagnation in the field noted by Franklin et al. [6]. However, as this analysis indicates, there are systematic biases in the amount of information in these narratives as a function of decedent characteristics. Accounting for these biases will enhance the rigor of future studies that seek to extract the information potential of these narratives, whether using data science or traditional qualitative approaches.

Findings should be interpreted considering study limitations and strengths. First, this study cannot identify the reasons for the incompleteness or length of the narratives. For example, if police are less likely to be called to investigate the deaths of older decedents this could result in more missing or shorter LE narratives, but this cannot be determined from the registry data. Second, briefer narratives are not necessarily of poor quality; while it is beyond the scope of this analysis, future work should examine whether the information content in the narratives is related to decedent characteristics. This study also has several strengths. The large sample size and breadth of variables allowed us to explore variation across a wide range of decedent characteristics, and these findings can inform future data science (i.e., NLP) as well as traditional qualitative analysis of these narratives.

Although the NVDRS is a registry that is collated for researchers, the source documents it relies on to generate its data, both quantitative and qualitative (i.e., law enforcement reports, death certificates), were designed with a different purpose and are created by non-researchers (i.e., police officers, coroners, etc.). This is not a unique problem: for example, health services researchers routinely use insurance billing records to quantify the burden of disease and identify risk factors even though these records were designed for tracking healthcare payments. It is recognized that billing records have valuable information regarding population health and well-being, but also that these records are incomplete indicators of those constructs.

Conceptually, the NVDRS has complete catchment of suicide mortality in the United States. This potential makes it an invaluable resource for public health. However, the amount of information that is contained in this registry is uneven. Systematic patterns in incomplete data, particularly across racial/ethnic groups, have been previously documented in mortality records [26, 3638] and population health surveillance efforts (e.g., COVID infection and mortality [39]) The CDC and state NVDRS programs should examine why the information bias identified in this study occurs, and work with local, state, and federal stakeholders, as well as external researchers, to address it. Potential means of addressing the issues identified in this existing archive include the creation of sampling weights that account for differential selection (i.e., missingness) of having a narrative, and collaborating with data users to create trainings for researchers who want to use the narrative data to ensure their analytic approach minimizes potential biases. For future data abstraction in this archive, NVDRS sites should experiment with different approaches to incentive more complete data collection from local stakeholders and high-quality narrative abstraction. These text data have tremendous potential to provide new insights into suicide risk and minimizing information bias in will help ensure these narratives fulfill that potential.

Supporting information

S1 Appendix. Exploring the “information potential” of short and long narrative texts.

(DOCX)

S1 Table. Logistic regression of missing status for NVDRS narratives abstracted from coroner/medical examiner and law enforcement reports.

(DOCX)

S2 Table. Logistic regression of missing status for NVDRS narratives abstracted from coroner/medical examiner and law enforcement reports: Sensitivity analysis excluding sites with <5 missing narratives.

(DOCX)

S3 Table. Quasi-Poisson regression of character length of NVDRS narratives predicted by demographic characteristics.

(DOCX)

S4 Table. Quasi-Poisson regression of character length of NVDRS narratives predicted by demographic characteristics, excluding outliers (longest 1% of narratives).

(DOCX)

S5 Table. Sensitivity analyses for logistic regression of missing status for NVDRS narratives abstracted from coroner/medical examiner (CME) reports.

(DOCX)

S6 Table. Sensitivity analyses for logistic regression of missing status for NVDRS narratives abstracted from law enforcement (LE) reports.

(DOCX)

S7 Table. Sensitivity analyses of Quasi-Poisson regression of character length of NVDRS narratives abstracted from coroner/medical examiner (CME) reports.

(DOCX)

S8 Table. Sensitivity analyses of Quasi-Poisson regression of character length of NVDRS narratives abstracted from law enforcement (LE) reports.

(DOCX)

Acknowledgments

Disclaimer: The findings and conclusions of this study are those of the authors alone and do not necessarily represent the official position of the Centers for Disease Control and Prevention or of participating National Violent Death Reporting System (NVDRS) states. The NVDRS is administered by the Centers for Disease Control and Prevention by participating NVDRS states.

Data Availability

The narrative data used in this analysis are available by request from the CDC through their restricted-access data process. Our use of these restricted-access NVDRS data is governed by a Data Use Agreement (DUA) with the CDC. This DUA legally prohibits us from sharing these data with outside investigators. Any investigator can gain access to these restricted access NVDRS data by contacting nvdrs-rad@cdc.gov and following the procedures outlined here: https://www.cdc.gov/violenceprevention/datasources/nvdrs/dataaccess.html Other NVDRS data are publicly-available: https://www.cdc.gov/violenceprevention/datasources/nvdrs/datapublications.html. Cells with <5 observations have been suppressed in this publication, as required by the NVDRS Data Use Agreement.

Funding Statement

This project was supposed by the National Institute of Mental Health (R21-108989, https://www.nimh.nih.gov/) and the American Foundation for Suicide Prevention (DIG-1-110-19, https://afsp.org/), both to B. Mezuk. The funders had no role in the conceptualization, analysis, interpretation, or decision to publish this manuscript.

References

  • 1.Hedegaard H, Curtin S, Warner M. Increase in Suicide Mortality in the United States, 1999–2018. NCHS Data Brief 2020;362. [PubMed] [Google Scholar]
  • 2.National Action Alliance for Suicide Prevention. National Strategy for Suicide Prevention n.d. https://theactionalliance.org/our-strategy/national-strategy-suicide-prevention (accessed December 20, 2020).
  • 3.American Foundation of Suicide Prevention. Three Year Strategic Plan. American Foundation for Suicide Prevention 2020. https://afsp.org/three-year-strategic-plan (accessed December 20, 2020).
  • 4.Gordon J, Volkow N. Suicide Deaths Are a Major Component of the Opioid Crisis that Must Be Addressed. NIMH Director’s Message 2019. https://www.nimh.nih.gov/about/director/messages/2019/suicide-deaths-are-a-major-component-of-the-opioid-crisis-that-must-be-addressed.shtml (accessed December 20, 2020).
  • 5.Office of the Surgeon General AS for H (ASH). Suicide Prevention Reports And Publications. HHSGov 2019. https://www.hhs.gov/surgeongeneral/reports-and-publications/suicide-prevention/index.html (accessed December 20, 2020).
  • 6.Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological Bulletin 2017;143:187–232. 10.1037/bul0000084 [DOI] [PubMed] [Google Scholar]
  • 7.CDC. CDC’s National Violent Death Reporting System (NVDRS) n.d. https://www.cdc.gov/violenceprevention/pdf/NVDRS-factsheet508.pdf (accessed December 20, 2020).
  • 8.CDC National Violent Death Reporting System. NVDRS Data and Publications 2019. https://www.cdc.gov/violenceprevention/datasources/nvdrs/datapublications.html (accessed December 20, 2020).
  • 9.CDC. CDC’s National Violent Death Reporting System now includes all 50 states 2018. https://www.cdc.gov/media/releases/2018/p0905-national-violent-reporting-system.html (accessed December 20, 2020).
  • 10.Nazarov O, Guan J, Chihuri S, Li G. Research utility of the National Violent Death Reporting System: a scoping review. Inj Epidemiol 2019;6. 10.1186/s40621-019-0196-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cavanagh J, Carson A, Sharp M, Lawrie S. Psychological autopsy studies of suicide: a systematic review. Psychological Medicine 2003;33:395–405. doi: 10.1017/s0033291702006943 [DOI] [PubMed] [Google Scholar]
  • 12.McGill K, Hackney S, Skehan J. Information needs of people after a suicide attempt: A thematic analysis. Patient Educ Couns 2019;102:1119–24. 10.1016/j.pec.2019.01.003 [DOI] [PubMed] [Google Scholar]
  • 13.Mezuk B, Ko TM, Kalesnikava VA, Jurgens D. Suicide Among Older Adults Living in or Transitioning to Residential Long-term Care, 2003 to 2015. JAMA Netw Open 2019;2:e195627. 10.1001/jamanetworkopen.2019.5627 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Stone DM, Holland KM, Schiff LB, McIntosh WL. Mixed Methods Analysis of Sex Differences in Life Stressors of Middle-Aged Suicides. Am J Prev Med 2016;51:S209–18. 10.1016/j.amepre.2016.07.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Brown S, Seals J. Intimate partner problems and suicide: are we missing the violence? J Inj Violence Res 2019;11:53–64. 10.5249/jivr.v11i1.997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Roberts K, Miller M, Azrael D. Honor-Related Suicide in the United States: A Study of National Violent Death Reporting System Data. Archives of Suicide Research 2019;23:34–46. 10.1080/13811118.2017.1411299 [DOI] [PubMed] [Google Scholar]
  • 17.Skopp NA, Holland KM, Logan JE, Alexander CL, Floyd CF. Circumstances preceding suicide in U.S. soldiers: A qualitative analysis of narrative data. Psychological Services 2019;16:302–11. 10.1037/ser0000221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Choi NG, DiNitto DM, Marti CN, Conwell Y. Physical Health Problems as a Late-Life Suicide Precipitant: Examination of Coroner/Medical Examiner and Law Enforcement Reports. The Gerontologist 2019;59:356–67. 10.1093/geront/gnx143 [DOI] [PubMed] [Google Scholar]
  • 19.National Violent Death Reporting System Web Coding Manual, v5.3 n.d.:205.
  • 20.Safe States Alliance. NVDRS: Stories from the frontlines of violent death surveillance. 2015.
  • 21.Delgado-Rodríguez M, Llorca J. Bias. Journal of Epidemiology & Community Health 2004;58:635–41. 10.1136/jech.2003.008466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Guetterman TC, Chang T, DeJonckheere M, Basu T, Scruggs E, Vydiswaran VGV. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study. J Med Internet Res 2018;20:e231. 10.2196/jmir.9702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.WISQARS (Web-based Injury Statistics Query and Reporting System)|Injury Center|CDC 2020. https://www.cdc.gov/injury/wisqars/index.html (accessed December 21, 2020).
  • 24.Crosby AE, Mercy JA, Houry D. The National Violent Death Reporting System: Past, Present, and Future. American Journal of Preventive Medicine 2016;51:S169–72. 10.1016/j.amepre.2016.07.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Logan JE, Karch DL, Crosby AE. Reducing “Unknown” Data in Violent Death Surveillance: A Study of Death Certificates, Coroner/Medical Examiner and Police Reports From the National Violent Death Reporting System, 2003–2004. Homicide Studies 2009;13:385–97. 10.1177/1088767909348323. [DOI] [Google Scholar]
  • 26.Hoffman RA, Venugopalan J, Qu L, Wu H, Wang MD. Improving Validity of Cause of Death on Death Certificates. ACM BCB 2018;2018:178–83. 10.1145/3233547.3233581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Björkenstam C, Johansson L-A, Nordström P, Thiblin I, Fugelstad A, Hallqvist J, et al. Suicide or undetermined intent? A register-based study of signs of misclassification. Popul Health Metr 2014;12:11. 10.1186/1478-7954-12-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bakst SS, Braun T, Zucker I, Amitai Z, Shohat T. The accuracy of suicide statistics: are true suicide deaths misclassified? Soc Psychiatry Psychiatr Epidemiol 2016;51:115–23. 10.1007/s00127-015-1119-x [DOI] [PubMed] [Google Scholar]
  • 29.CDC. US Standard Certificate of Death n.d.
  • 30.McCaffrey DF, Bell RM. Improved hypothesis testing for coefficients in generalized estimating equations with small samples of clusters. Stat Med 2006;25:4081–98. 10.1002/sim.2502 [DOI] [PubMed] [Google Scholar]
  • 31.Ver Hoef JM, Boveng PL. Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? Ecology 2007;88:2766–72. 10.1890/07-0043.1 [DOI] [PubMed] [Google Scholar]
  • 32.Huguet N, Kaplan MS, McFarland BH. Rates and correlates of undetermined deaths among African Americans: results from the National Violent Death Reporting System. Suicide Life Threat Behav 2012;42:185–96. 10.1111/j.1943-278X.2012.00081.x [DOI] [PubMed] [Google Scholar]
  • 33.Dailey NJM, Norwood T, Moore ZS, Fleischauer AT, Proescholdbell S. Evaluation of the North Carolina Violent Death Reporting System, 2009. N C Med J 2012;73:257–62. [PubMed] [Google Scholar]
  • 34.Shain BN. Increases in Rates of Suicide and Suicide Attempts Among Black Adolescents. Pediatrics 2019;144. 10.1542/peds.2019-1912 [DOI] [PubMed] [Google Scholar]
  • 35.Stone DM, Simon T, Fowler K, Kegler S, Yuan K, Holland K, et al. Contributing to Suicide—27 States, 2015Trends in State Suicide Rates—United States, 1999–2016 and Circumstances. MMWR Morb Mortal Wkly Rep 2018;67. 10.15585/mmwr.mm6722a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Johns LE, Madsen AM, Maduro G, Zimmerman R, Konty K, Begier E. A Case Study of the Impact of Inaccurate Cause-of-Death Reporting on Health Disparity Tracking: New York City Premature Cardiovascular Mortality. Am J Public Health 2013;103:733–9. 10.2105/AJPH.2012.300683 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Elo IT, Preston SH. Estimating African-American Mortality from Inaccurate Data. Demography 1994;31:427–58. 10.2307/2061751. [DOI] [PubMed] [Google Scholar]
  • 38.Sehdev AES, Hutchins GM. Problems With Proper Completion and Accuracy of the Cause-of-Death Statement. Arch Intern Med 2001;161:277. 10.1001/archinte.161.2.277 [DOI] [PubMed] [Google Scholar]
  • 39.Labgold K, Hamid S, Shah S, Gandhi NR, Chamberlain A, Khan F, et al. Estimating the Unknown: Greater Racial and Ethnic Disparities in COVID-19 Burden After Accounting for Missing Race and Ethnicity Data. Epidemiology 2021;32:157–61. 10.1097/EDE.0000000000001314 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Ellen L Idler

4 Feb 2021

PONE-D-21-00409

Not discussed: Inequalities in narrative text data for suicide deaths in the National Violent Death Reporting System

PLOS ONE

Dear Dr. Mezuk,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Two expert reviewers with considerable research in the area have provided reviews of your paper.  They both see considerable merit in your paper but make suggestions for strengthening the analysis. One issue raised by both is a concern with state-level variation; both reviewers provide specific suggestions for further analysis.  Two additional critiques I would highlight would be the issue of the relationship of LE/CME missingness to missingness of other data (Reviewer 1) and the relationship of these findings to the larger issue of missing data in other official records (Reviewer 2).  Finally, a comment of my own -- it would be helpful to know the extent of overlap of the two types of missing (and nonmissing) data.  Please address all points raised by the reviewers.

Please submit your revised manuscript by Mar 21 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Ellen L. Idler

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper is well-written, engaging to read, and addresses a basic (but important) question about the quality of NVDRS narrative data. With a few methodological changes, discussed below, I think that this paper contributes to the existing literature and may help current NVDRS abstractors address inequalities in data collection.

Before I continue, I want to preface this review by clarifying that my current knowledge of the NVDRS system is on the abstraction side--I work with a state NVDRS and SUDORS (an overdose-specific subset of the NVDRS) team to improve their data quality. This means that I have a good knowledge of the data itself, but less knowledge about the data version your team for analysis.

The first central methodological issue I see in this paper is a failure to fully address the state level variations in the data. Simply including a state dummy variable is not sufficient, as it does not address the fact that error terms will be clustered at the state level. While you acknowledge that there is state-level variation in the data, you need to build in either clustered error effects or potentially use a hierarchical model to fully address the variation. The NVDRS entry system is complex, and states have developed a wide variety of methodologies for submitting their data. Further, the difficulties of centralized versus local authority reporting means that some states may get electronic data downloads with CME reports (which often include written narratives), while others rely on scanned PDFs and fully manual abstraction. Finally, these reports are on tight schedules, so states that rely on fully manual abstraction may have less time to develop lengthy narratives. In short, I suspect state-level factors may be even more important than you suggested, and need a more robust inclusion in the model. If I were you, I would run a hierarchical model clustering at the state level, with the same outcomes/distributions.

My second methodological issue is more basic. It seemed odd to me that the missing narratives were not discussed in the context of other missing variables. LE narratives, in particular, were probably missing due to a lack of law enforcement investigation (which you do mention). If the data allows, restricting your analysis to only cases where some other data from the same source was entered would be helpful. Otherwise, it is not clear if your results are just picking up on a lack of investigation, rather than a specific issue with the narratives themselves. Your second analysis, which focuses on the length of narratives, helped address this problem, but I think you need to be a bit more specific in either addressing why you did not filter out cases without ANY LE information, or remove them. I suspect that there is a strong correlation--the NVDRS training strongly discourages abstractors from skipping the narrative if any LE or CME information exists. If that is true, your paper may need to more clearly acknowledge that the missing narrative problem is directly and solely driven by missing data problem. If the correlation is moderate, you could include other variable missingness (as a percent, maybe?) as a variable in your regressions.

Your work briefly acknowledged many of the issues I discuss above, and I think will be a strong and interesting paper once they are more squarely addressed. I would love to see a revision, and to share the final version with my team--I know they would find it interesting!

Reviewer #2: This paper uses NVDRS data to examine how decedent characteristics are related to the length of narratives contained in the data set. Increasing numbers of studies are using NVDRS narratives to shed light on circumstances surrounding suicide. Thus, this study, although primarily descriptive in nature, is useful in encouraging researchers to think through possible biases in analyses of narratives. In some sense, the findings are not terribly surprising – they are consistent with what we know to be true from undercounts in the Census and inaccuracies in other official sources of data. Those who are male and racial minorities are more likely to be excluded in both cases.

I have the following suggestions for improvement:

1. The authors use the length of the narrative (in terms of character count) as a proxy for the information potential and quality of text. There are limitations to this approach, as noted by the authors on page 25. It would be a useful addition to include a small random sample of narratives of different lengths to contextualize the differences in the quality of information contained in these narratives. Were any sensitivity analyses conducted to determine if the results differ if number of words (rather than number of characters including spaces) is used to proxy length?

2. P. 13: Given the possibility for coder bias, can the authors control for individual NVDRS coders and/or their length of experience? For example, if newly-added states to the NVDRS are more demographically diverse and less experienced coders are working on those narratives, it could skew results. I don’t know whether that’s the case but it’s one of several possibilities.

3. There are also important differences in the background of medical examiners and coroners which may affect the original reports. The study controls for states, thus capturing potential state differences but beyond state controls, have they considered other ways to capture geographic differences in death investigation systems. E.g. https://www.cdc.gov/phlp/publications/coroner/death.html. Data on county of residence of the decedent are included in the NVDRS.

4. The authors might consider putting Tables 1 and 2 in the appendix and the regression analyses currently in the appendix in the body of the paper. Tables 1 and 2 should indicate whether the differences (e.g. between non-missing/missing and between C/ME and LE) are statistically significant.

5. Regarding a dose response relationship between age and odds of a missing narrative “consistent for C/ME and LE texts” (p. 20, lines 244-246). According to the CI in Table 1, the effect of age on the C/ME missing is generally not significant, and there are no differences across the age groups in the effect. The patterns are different for LE missing.

6. As alluded to in #2, the controls are interesting in their own right. For example, there are significant differences across states and over time in the patterns of missingness and narrative length. At a minimum, it would be useful to provide more discussion and suggestions for future research as to why these differences exist. Some of the between-state difference may relate to the points raised on page 24 (some could be tested explicitly – e.g. time in system).

7. The discussion would also benefit from further explication of possible reasons as to why these patterns exist. For examples, studies of the Census undercount and/or inaccuracies in other official records would provide insight. The authors mention this only in the last sentence of the paper. It would be useful to synthesize and relate some of the possible explanations for these patterns in other sources to this analysis to provide a richer interpretation of the findings.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 16;16(7):e0254417. doi: 10.1371/journal.pone.0254417.r002

Author response to Decision Letter 0


20 Apr 2021

Comments from the editor

1. Two additional critiques I would highlight would be the issue of the relationship of LE/CME missingness to missingness of other data (Reviewer 1) and the relationship of these findings to the larger issue of missing data in other official records (Reviewer 2). Finally, a comment of my own -- it would be helpful to know the extent of overlap of the two types of missing (and nonmissing) data.

Thank you for this comment. We woud like to clarify that our dataset includes three “types” of missing data: (1) Observations with a narrative character length of zero (C/ME, LE, or both) or whose circumstances were coded as “not known” by the NVDRS abtractors; (2) Observations whose narrative status was assigned as missing by the investigators based on having <31 characters, even though circumstances were coded as ‘known’ by the NVDRS abstractors; and (3) missing data on the predictors (age, sex, education, etc.).

To illustrate the extent of overlap in missingness for the first two types, have added the variable “Circumstances known” to the bottom of Table 1. It illustrates that of those observations whose C/ME narrative is coded as “missing,” 62.2% are missing because the circumstances were not known (first type of missing data) and 37.8% were coded as missing by the investigators because the narratives were <31 characters long (second type of missing data). These proportions are roughly reversed for the LE narratives. Overall, 6,170 (3%) observations were missing both C/ME and LE narratives (which we now state in the text - see Methods, Analysis 1).

This third type of missing data (missing data on predictors) is addressed in our response to Reviewer #1, comment #2.

The issue of missing data in mortality/administrative records more generally is addressed in our response to Reviewer #2, comments #6 and 7.

2. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

We have made these style changes, as requested.

3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. Or b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We have responded to these prompts in the revised cover letter, as requested.

Reviewer #1 Comments to the Author

This paper is well-written, engaging to read, and addresses a basic (but important) question about the quality of NVDRS narrative data. With a few methodological changes, discussed below, I think that this paper contributes to the existing literature and may help current NVDRS abstractors address inequalities in data collection. Before I continue, I want to preface this review by clarifying that my current knowledge of the NVDRS system is on the abstraction side--I work with a state NVDRS and SUDORS (an overdose-specific subset of the NVDRS) team to improve their data quality. This means that I have a good knowledge of the data itself, but less knowledge about the data version your team for analysis.

1. The first central methodological issue I see in this paper is a failure to fully address the state level variations in the data. Simply including a state dummy variable is not sufficient, as it does not address the fact that error terms will be clustered at the state level. While you acknowledge that there is state-level variation in the data, you need to build in either clustered error effects or potentially use a hierarchical model to fully address the variation. The NVDRS entry system is complex, and states have developed a wide variety of methodologies for submitting their data. Further, the difficulties of centralized versus local authority reporting means that some states may get electronic data downloads with CME reports (which often include written narratives), while others rely on scanned PDFs and fully manual abstraction. Finally, these reports are on tight schedules, so states that rely on fully manual abstraction may have less time to develop lengthy narratives. In short, I suspect state-level factors may be even more important than you suggested, and need a more robust inclusion in the model. If I were you, I would run a hierarchical model clustering at the state level, with the same outcomes/distributions.

Thank you for this comment. You are correct that there is significant clustering by state, which we quantified using the intraclass correlation coefficient (ICC), which quantifies the variation between vs. within states.

ICC for C/ME missingness: 0.57, for LE missingness: 0.48

As a comparison, the ICC for clustering of C/ME missingness by incident year was only 0.01 and for LE missingness was only 0.03. This illustrates that the amount of missingness in narratives clusters within states, but not within years (time). As a result, incident year was included as a covariate in all models of narrative missingness, and state was used as a clustering variable.

ICC for C/ME length: 0.35, for LE length: 0.43

As a comparison, the ICC for clustering of C/ME length by incident year was only 0.18 and for LE missingness was only 0.11. This illustrates that the length of narratives clusters within states, with some modest clustering within years (time). As a result, incident year was included as a covariate in all models of narrative length, and state was used as a clustering variable.

In response, we we considered the following methods: 1) Quasipoisson model with robust standard errors, 2) multilevel modeling (in our case, Generalized Linear Mixed Model (GLMM)) and bootstrap while having state and year as random effects, and 3) a generalized estimating equation (GEE) with state as clusters, assuming exchangeable correlation structure and using sandwich estimator to be robustness of misspecification. All of the options are appropriate to address the violation of independence and homogeneity of variance. The first option works the best when the source of heteroscedasticity is unknown. However, in our case we know that the data came in clusters (state).

Both the second option and third option are appropriate for our research question and data. GEE and GLMM in general give very similar results. The difference of the two is that GEE gives the population (here, state) averaged estimates of parameters and GLMM gives individual estimates (here, state specific) averaged estimates of parameters. Since we are not interested in estimating the state specific estimates of parameters, using GEE tends to be a common choice. GEE in general requires a large number of clusters ~40, and the NVDRS dataset has 37 states as clusters.

Therefore, we refit our regression models using GEE logistic (for narrative missingness) and quasi-poisson (for narrative length, conditional on non-missingness) models with state as clusters, assuming exchangeable correlation structure and using sandwich estimator to be robustness of misspecification.

We have updated the Methods and the text, tables and figures of the Results using these GEE models. We note that the point estimates are substantially unchanged from the first manuscript, but now the standard errors reflect the clustering of observations within states. Please see the revised manuscript.

Also please see our response to Reviewer #2, comment #6 in which we discuss the importance of local, state, and federal stakeholders to collaborate to identify, and address, the sources of these systematic biases in data completeness.

2. My second methodological issue is more basic. It seemed odd to me that the missing narratives were not discussed in the context of other missing variables. LE narratives, in particular, were probably missing due to a lack of law enforcement investigation (which you do mention). If the data allows, restricting your analysis to only cases where some other data from the same source was entered would be helpful. Otherwise, it is not clear if your results are just picking up on a lack of investigation, rather than a specific issue with the narratives themselves. Your second analysis, which focuses on the length of narratives, helped address this problem, but I think you need to be a bit more specific in either addressing why you did not filter out cases without ANY LE information, or remove them. I suspect that there is a strong correlation--the NVDRS training strongly discourages abstractors from skipping the narrative if any LE or CME information exists. If that is true, your paper may need to more clearly acknowledge that the missing narrative problem is directly and solely driven by missing data problem. If the correlation is moderate, you could include other variable missingness (as a percent, maybe?) as a variable in your regressions.

Thank you for this comment. Generally, there is reatlively little missing data on the predictors/covariates used in the analysis as they are largely drawn from death certificates, as shown in Table 1 (i.e., only 89 observations (0.04%) in the dataset in total (that is, not conditional on having a narrative) were missing the variable “sex”). In addition, we explicitly model any covariate missingness in our regression models (see Supplemental Tables 1 and 2) - that is, we do not exclude any cases because of missingness on covariates, but instead include a dummy-coded “missing” value for every predictor in our analysis.

Finally, the analysis of narrative length (quasi-Poisson modeling) uses a truncated distribution of narratives to examine length, wherein we removed any observation with missing narratives from this analysis (either truly missing (character length=0) or assigned by us as being missing (character length<31, a threshold we determined through exploring the content of texts across different lengths).

In response to this comment, we have clarified how we handled missing data on covariates in the Methods (reprinted below):

“While the amount of missing data in these predictor variables was generally small (see Table 1a), we included a dummy code for missingness for all predictors so that these observations were retained in the regression analyses.”

Reviewer #2 Comments to the Author

1. The authors use the length of the narrative (in terms of character count) as a proxy for the information potential and quality of text. There are limitations to this approach, as noted by the authors on page 25. It would be a useful addition to include a small random sample of narratives of different lengths to contextualize the differences in the quality of information contained in these narratives. Were any sensitivity analyses conducted to determine if the results differ if number of words (rather than number of characters including spaces) is used to proxy length?

In response to this comment, we have added annotated examples of a random sample of five short (31 to <200 characters) and five long (>500 characters) C/ME and LE narratives to the Appendix (which also provides examples of texts that have <31 characters). The annotation describes the elements and features of these texts and serves as a crude proxy of the “information potential” of the texts.

We appreciate and understand the sentiment of this comment, but feel that a meaningful analysis of the information potential of these narrative texts is beyond the scope of this paper and would benefit from a data science (e.g., natural language processing, topic modeling) approach. We have added emphasis on this in the Discussion (please see revised text).

2. P. 13: Given the possibility for coder bias, can the authors control for individual NVDRS coders and/or their length of experience? For example, if newly-added states to the NVDRS are more demographically diverse and less experienced coders are working on those narratives, it could skew results. I don’t know whether that’s the case but it’s one of several possibilities.

Thank you for this comment. We do not have data on individual coders/NVDRS staff, unfortunately. However, our new analytic approach of accounting for clustering within states using Generalized Estimating Equations (GEE, see response to Reviewer #1) should account for factors like coder experience that cluster within site). Even with this new analytic approach, we still observe substantial disparities in narrative length by race/ethnicity in these data.

3. There are also important differences in the background of medical examiners and coroners which may affect the original reports. The study controls for states, thus capturing potential state differences but beyond state controls, have they considered other ways to capture geographic differences in death investigation systems. E.g. https://www.cdc.gov/phlp/publications/coroner/death.html. Data on county of residence of the decedent are included in the NVDRS.

Thank you for this comment. We believe our new modeling approach (GEE) is an appropriate means of addressing factors (like centralized vs. decentralized death examination systems) that vary across states.

We also want to clarify that our intent in this analysis is to account for state (NVDRS site) clustering as an analytic issue, not to identify the reasons for that state clustering (which is a distinct research question). We want to characterize the NVDRS data archive as a whole as a means to inform future research. Therefore, we do not feel it is appropriate to examine sub-site factors (e.g., county of death) in this analysis.

Please also see our response to comment #6 below.

4. The authors might consider putting Tables 1 and 2 in the appendix and the regression analyses currently in the appendix in the body of the paper. Tables 1 and 2 should indicate whether the differences (e.g. between non-missing/missing and between C/ME and LE) are statistically significant.

The findings shown in Supplemental Tables 2 and 3 (regression tables) are identical to Figures 2 and 3 in the main text, and therefore we think including them in the main text would be redundant. In contrast, current Tables 1 and 2 provide a description of the narrative data in absolute terms (vs. the regression tables which only show relative differences) and therefore we feel they provide valuable information for the reader that is not present elsewhere in the Results. As such, we have not made this suggested change.

We have elected not to include p-values in Table 1 because the purpose of this table is to describe the sample, rather than to test any particular hypotheses, and it contains a lot of information already. The Supplemental Tables provide the 95% confidence intervals for all these comparisons while accounting for state clustering.

5. Regarding a dose response relationship between age and odds of a missing narrative “consistent for C/ME and LE texts” (p. 20, lines 244-246). According to the CI in Table 1, the effect of age on the C/ME missing is generally not significant, and there are no differences across the age groups in the effect. The patterns are different for LE missing.

Thank you for this comment. We have now corrected the text to reflect that the relationship between age and narrative missingness is limited to LE texts.

6. As alluded to in #2, the controls are interesting in their own right. For example, there are significant differences across states and over time in the patterns of missingness and narrative length. At a minimum, it would be useful to provide more discussion and suggestions for future research as to why these differences exist. Some of the between-state differences may relate to the points raised on page 24 (some could be tested explicitly – e.g. time in system).

While we agree with the sentiment of the comment, our goal is to draw attention to these patterns to external researchers, like ourselves, who are interested in addressing substantive scientific questions with this archive can do so in a manner that accounts for the information bias we have identified.

In response to this comment we have added the ICCs that show the amount of clustering by state (see response to Reviewer #1, comment #1) and added language to the Discussion regarding suggestions for future research and collaborations between data creators and data users (see revised text, last paragraph of the Discussion).

7. The discussion would also benefit from further explication of possible reasons as to why these patterns exist. For examples, studies of the Census undercount and/or inaccuracies in other official records would provide insight. The authors mention this only in the last sentence of the paper. It would be useful to synthesize and relate some of the possible explanations for these patterns in other sources to this analysis to provide a richer interpretation of the findings.

While we agree with the sentiment of the comment, our goal is to draw attention to these patterns to external researchers, like ourselves, who are interested in addressing substantive scientific questions with this archive can do so in a manner that accounts for the information bias we have identified.

Moreover, we do not feel that our analysis can test the reasons for the patterns we observe, as we note in our discussion of study limitations, and that any comments we make as to their source would be speculative. That is, with the data we have we cannot capture important NVDRS system factors like updates to the data abstraction dashboard (i.e., introducing new variables, changing the coding of variables), changes in training, changes in staffing, new initiatives prioritizing data collection on certain groups, etc. that may contribute to these patterns. Other stakeholders (individual sites, the CDC) likely have data on these elements that likely could address these questions. Therefore, we feel collaboration with CDC and state NVDRS staff is the most effective means of identifying the reasons for these patterns - and addressing them in the future.

In response to this comment, we have added additional text to the Discussion on the need for collaboration between data creators and data users in the NVDRS to maximize the scientific utility of this archive. Please see the final paragraph of the Discussion.

Attachment

Submitted filename: PLOS One Response Letter.docx

Decision Letter 1

Ellen L Idler

26 May 2021

PONE-D-21-00409R1

Not discussed: Inequalities in narrative text data for suicide deaths in the National Violent Death Reporting System

PLOS ONE

Dear Dr. Mezuk,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both of the reviewers and I agree that most of the issues raised in the first round of reviews have been well-addressed, and that the manuscript is much improved.  Each reviewer however raises one or more remaining but minor issues that would further improve the paper.  Reviewer 1 would like to have a fuller disclosure of the data sources to which you did or did not have access -- this would be a very helpful step for future research in the area.  Reviewer 2 recommends addressing the missingness of the decedent characteristic of education, since it is higher than the other characteristics, and suggests a sensitivity analysis excluding deaths of undetermined cause.  Please either make these changes or explain why you are not doing so.

Overall, however the paper makes a strong contribution, and the Appendix with text examples is particularly enlightening.

Please submit your revised manuscript by Jul 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ellen L. Idler

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This version represents a substantial improvement over the original, and addresses all of my major concerns with the first version. I have only one lingering concern. While I understand and support the authors' decision to not speculate on the mechanisms behind narrative length, I still think it needs to be more explicitly stated, earlier than in the discussion, that narratives are generally missing because the underlying document/information needed to complete a narrative was missing--and that the presence of these documents can be ascertained in other ways. In my initial review, I assumed the authors had access to the LE/CME specific variables that the NVDRS collects (such as toxicology, death circumstances, etc), which would be an easy way to check the assumption that a lack of a narrative implies a lack of (access to) a CME/LE, but the writing suggests that those are not available to the authors. I thought they were available in the restricted-access version of the NVDRS data. They are correct that death certificates do not provide the needed information--in our state, even the "autopsy performed" variable does not necessarily reflect whether or not our team had access to an autopsy. If it is not possible within the scope of the authors' data access to adjust for the presence of CME/LE data, it should at least be noted that the data exists, and could be utilized by someone with different permissions or working with a different time period. If the authors did have access to these variables and chose not to use them, a more robust explanation of the rationale is needed. I think it would be appropriate to consider these variables even if they do not exist for portions of the study period (as NVDRS data collection does change frequently, as the authors noted). Some information on the correlation between missing source data and narrative length would be so helpful.

I think that these concerns could be addressed briefly in the text. The discussion surrounding state variation was great, and already touches on some of the reasons why abstractors may not have access to these documents, so it shouldn't take much tweaking to address the other variables. Either way, I think the narratives represent a better aggregate measure of data availability than any one variable--the discussion surrounding narrative length was particularly interesting. I also agree with Reviewer 2 and wish that the regression tables were available in the main text. I find them easier to interpret than the forest plots. However, that's more of a personal preference than a true problem with the paper.

Reviewer #2: The authors have done a good job of addressing concerns raised in the first set of reviews. I have just a couple of additional points of clarification:

1. Although the level of missingness for decedent characteristics other than the presence of a narrative is generally low, that is not the case for education (25-30% of cases lack this information). Given this high level of missingness and the fact that including a dummy variable to indicate missing status can lead to biased estimates (Paul Allison and others), I recommend that the authors use multiple imputation instead.

2. Did the authors conduct a sensitivity analysis to determine whether the substantive conclusions are changed if the deaths of undetermined cause are excluded? This would be a worthwhile check, and results could simply be reported in the text.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Ellen L Idler

28 Jun 2021

Not discussed: Inequalities in narrative text data for suicide deaths in the National Violent Death Reporting System

PONE-D-21-00409R2

Dear Dr. Mezuk,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ellen L. Idler

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Ellen L Idler

7 Jul 2021

PONE-D-21-00409R2

Not discussed: Inequalities in narrative text data for suicide deaths in the National Violent Death Reporting System

Dear Dr. Mezuk:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Ellen L. Idler

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Exploring the “information potential” of short and long narrative texts.

    (DOCX)

    S1 Table. Logistic regression of missing status for NVDRS narratives abstracted from coroner/medical examiner and law enforcement reports.

    (DOCX)

    S2 Table. Logistic regression of missing status for NVDRS narratives abstracted from coroner/medical examiner and law enforcement reports: Sensitivity analysis excluding sites with <5 missing narratives.

    (DOCX)

    S3 Table. Quasi-Poisson regression of character length of NVDRS narratives predicted by demographic characteristics.

    (DOCX)

    S4 Table. Quasi-Poisson regression of character length of NVDRS narratives predicted by demographic characteristics, excluding outliers (longest 1% of narratives).

    (DOCX)

    S5 Table. Sensitivity analyses for logistic regression of missing status for NVDRS narratives abstracted from coroner/medical examiner (CME) reports.

    (DOCX)

    S6 Table. Sensitivity analyses for logistic regression of missing status for NVDRS narratives abstracted from law enforcement (LE) reports.

    (DOCX)

    S7 Table. Sensitivity analyses of Quasi-Poisson regression of character length of NVDRS narratives abstracted from coroner/medical examiner (CME) reports.

    (DOCX)

    S8 Table. Sensitivity analyses of Quasi-Poisson regression of character length of NVDRS narratives abstracted from law enforcement (LE) reports.

    (DOCX)

    Attachment

    Submitted filename: PLOS One Response Letter.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The narrative data used in this analysis are available by request from the CDC through their restricted-access data process. Our use of these restricted-access NVDRS data is governed by a Data Use Agreement (DUA) with the CDC. This DUA legally prohibits us from sharing these data with outside investigators. Any investigator can gain access to these restricted access NVDRS data by contacting nvdrs-rad@cdc.gov and following the procedures outlined here: https://www.cdc.gov/violenceprevention/datasources/nvdrs/dataaccess.html Other NVDRS data are publicly-available: https://www.cdc.gov/violenceprevention/datasources/nvdrs/datapublications.html. Cells with <5 observations have been suppressed in this publication, as required by the NVDRS Data Use Agreement.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES