Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 13.
Published in final edited form as: Traffic Inj Prev. 2022 Jun 13;23(sup1):S130–S136. doi: 10.1080/15389588.2022.2083612

Improving identification of crash injuries: Statewide integration of hospital discharge and crash report data

Leah R Lombardi a, Melissa R Pfeiffer a, Kristina B Metzger a, Rachel K Myers a,b, Allison E Curry a,b
PMCID: PMC9744954  NIHMSID: NIHMS1837627  PMID: 35696334

Abstract

Objective:

The availability of complete and accurate crash injury data is critical to prevention and intervention efforts. Relying solely on hospital discharge data or police crash reports may result in a biased undercount of injuries. Linking hospital data with crash reports may allow for a more robust identification of injuries and an understanding of which populations may be missed in an analysis of one source. We used the New Jersey Safety and Health Outcomes (NJ-SHO) data warehouse to examine the share of the entire crash-injured population identified in each of the two data sources, overall and by age, race/ethnicity, sex, injury severity, and road user type.

Methods:

We utilized 2016-2017 data from the NJ-SHO warehouse. We identified crash-involved individuals in hospital discharge data by applying the ICD-10-CM external cause of injury matrix. Among crash-involved individuals, we identified those with injury- or pain-related diagnosis codes as being injured. We also identified crash-involved individuals via crash report data and identified injuries using the KABCO scale. We jointly examined the two sources; injuries in the hospital discharge data were documented as being related to the same crash as injuries found in the crash report data if the date of the crash report preceded the date of hospital admission by no more than two days.

Results:

In total, there were 262,338 crash-involved individuals with a documented injury in the hospital discharge data or on the crash report during the study period; 168,874 had an injury according to hospital discharge data, and 164,158 had an injury in crash report data. Only 70,694 (26.9%) had an injury in both sources. We observed differences by age, race/ethnicity, injury severity, and road user type: hospital discharge data captured a larger share of those ages 65+, those who were Black or Hispanic, those with higher severity injuries, and those who were bicyclists or motorcyclists.

Conclusions:

Each data source in isolation captures approximately two-thirds of the entire crash-injured population; one source alone misses approximately one-third of injured individuals. Each source undercounts people in certain groups, so relying on one source alone may not allow for tailored prevention and intervention efforts.

Keywords: Motor vehicle crashes, data integration, injuries, International Classification of Diseases, hospital discharge data, police crash reports

INTRODUCTION

The availability of complete and accurate injury data is critical to ensuring that intervention and prevention efforts are appropriately tailored and effective in mitigating motor vehicle crash injuries. Many stakeholders rely on hospital discharge data or crash reports completed by responding officers, which should include injury status for all crash-involved individuals. However, when researchers depend solely on either source to conduct a study on crash injuries, the results may undercount the number of injuries and may be biased regarding the characteristics of those who were injured. For example, although hospital discharge data contain detailed information on injury location, multiple types of injuries, and injury severity, they do not include factors from the crash scene that may be associated with injury, such as crash and vehicle type or restraint use. Reliance on hospital data may also omit potentially minor injuries treated in other types of medical settings, such as primary care, or those injuries for which no medical care is sought. Injury data from crash reports are also limited, as they are typically restricted to broad classifications based on observed severity and are confined to the indication of a single most severe/apparent injury (U.S. Department of Transportation Federal Highway Administration 2016). Additionally, many injuries may not be outwardly visible to responding officers and are only able to be diagnosed in a clinical setting; crash data may classify someone as uninjured who later required medical care (Burch et al. 2014) or underestimate injury severity (Kamaluddin et al. 2019). Finally, some crash-involved individuals—including vulnerable road users like bicyclists and motorcyclists or those from marginalized populations—may be incompletely captured by crash report data; that is, some types of road users and populations may be more likely to have injury and sociodemographic data not missing at random (Labgold et al. 2021; Perkins et al. 2018; Sartin et al. 2021; Watson et al. 2015). For example, previous work has found that White individuals may be more likely to initiate contact with police for non-crime emergencies, including crashes, than those who are Black or Hispanic (Davis et al. 2018); however, the hospital admission rate for crash injuries may be higher among those who are Black or Hispanic compared with those who are White (Hamann et al. 2020).

Thus, examining hospital discharge or crash report data separately may bias our interpretations of injury risk, especially among specific populations. This not only limits our ability to classify the true magnitude of crash injuries to identify potential disparities or intervenable factors contributing to them (Labgold et al. 2021; Sartin et al. 2021), but also skews data informing crash injury mitigation efforts. To date, few US studies have linked hospital discharge and crash report data to assess the extent to which crash injuries—overall and by characteristics of injured individuals—are captured by each source (Conderino et al. 2017). Therefore, our long-term goal is to develop novel sources of data that overcome these biases and limitations to better inform prevention strategies. As a first step toward this long-term goal, the objective of this study was to leverage an integrated data source to characterize the share of the injured population documented by hospital discharge and crash report data. Specifically, we determined the extent to which each source captured the entire injured population, both overall and by age, race/ethnicity, sex, injury severity, and road user type, with the goal of identifying specific populations that might be underrepresented by analyzing one source alone.

METHODS

NJ-SHO Data Warehouse

We analyzed 2016-2017 data from the New Jersey Safety and Health Outcomes (NJ-SHO) data warehouse, a unique source that contains integrated data from numerous complex administrative databases. The NJ-SHO warehouse was developed via an individual-level probabilistic linkage of driver licensing histories, traffic-related citations and suspensions, police-reported crashes, birth certificates, death certificates, and hospital discharges; full details of its development and evaluation can be found elsewhere (Curry et al. 2021). This person-specific linkage allowed us to identify individuals represented in hospital discharge data (obtained from the NJ Department of Health), crash report data (obtained from the NJ Department of Transportation), or both sources. Additional details about crash-involved individuals—including age, race/ethnicity, and sex—were derived from all linked sources in the NJ-SHO warehouse. Notably, although race/ethnicity are typically available in hospital data, crash reports do not include race/ethnicity. Consequently, if an individual linked to a known race/ethnicity value (White, Black, Hispanic, Other) from a data source in the NJ-SHO warehouse, we assigned that value to the individual. If the individual did not have a known race/ethnicity, we applied their probabilities for the categories White, Black, Hispanic, and Other from the Bayesian Improved Surname Geocoding (BISG) algorithm, derived from their surname and geocoded residential address. We described, validated, and addressed limitations of this approach in previous work (Sartin et al. 2021).

Hospital Discharge Data

Hospital discharge data include information from inpatient, outpatient, and emergency department visits, including admission date and International Classification of Diseases 10th Revision, Clinical Modification (ICD-10-CM) diagnosis codes. We applied the 2019 ICD-10-CM external cause of injury matrix to all codes for each hospital discharge record to identify crash involvement (Hedegaard et al. 2019). We classified individuals with an external cause of injury of “motor vehicle-traffic” according to the matrix or those with an ICD-10-CM diagnosis code for “examination following transport accidents” (Z04.1) as being involved in a crash; these codes are hereafter referred to as MVT-codes (Table S1). We considered the earliest hospital discharge record that indicated crash involvement as the first date of a three-day crash period; all hospital records with an MVT-code within that three-day period (hereafter referred to as the “crash event period”) were considered to reflect the same crash.

We further used MVT-codes to identify each individual’s road user type: driver, passenger, pedestrian, bicyclist, motorcyclist, or unspecified/other. If there was more than one MVT-code in the crash event period, we combined information regarding road user type. If all MVT-codes indicated the same specified (i.e., not unspecified/other) type of road user, the individual was classified as that specific type. If any MVT-codes indicated different specified types of road users (e.g., both driver and passenger), the individual was classified as unspecified/other.

Among crash-involved individuals identified through MVT-codes, we examined all ICD-10-CM diagnosis codes included in records within the crash event period to classify whether they were injured and injury severity. Crash-involved individuals with an injury-related diagnosis code within the ICD-10-CM chapters “S” and “T” (except frostbite, poisoning, toxic effects, unspecified/other external causes, and complications of surgical or medical care) were considered injured (Table S2; Injury Surveillance Workgroup 2016). We also included pain-related ICD-10-CM codes; these may be important for a full understanding of individuals seeking medical attention. We mapped each injury diagnosis code to the Association for the Advancement of Automotive Medicine’s Abbreviated Injury Scale (AIS) to classify injury severity (minor [1], moderate [2], serious [3], severe [4], critical [5], or maximum [6]) during the crash event period (AAAM 2008; Glerum & Zonfrillo 2019; Loftis et al. 2016). An individual’s injury severity for a given crash was the maximum AIS score across all hospital discharge records during the crash event period; we chose maximum AIS rather than initial AIS to allow for a more comprehensive evaluation of injury severity.

Crash Report Data

Motor vehicle crashes are reportable in NJ if an injury or more than $500 in property damage occurred (State of New Jersey Motor Vehicle Commission 2017). Responding law enforcement officers document information on the crash date, circumstances, and all individuals involved in the crash, including their physical condition and road user type (i.e., driver, passenger, pedestrian, bicyclist, motorcyclist) (State of New Jersey Motor Vehicle Commission 2017). Responding officers also record the location and type of the most severe physical injury. Injuries documented in the physical condition field can be translated to the KABCO injury classification scale, which is the standard according to the Model Minimum Uniform Crash Criteria (MMUCC) (NHTSA 2012). We categorized crash-involved individuals as injured if their physical condition was noted as killed (K equivalent), incapacitated (A equivalent), moderate injury (B equivalent), or complaint of pain (C equivalent); individuals with unknown physical condition were grouped with the uninjured individuals (O equivalent) (U.S. Department of Transportation Federal Highway Administration 2016). Rarely, individuals had more than one crash documented within a three-day window; we examined data from the first crash report within that time period to allow for consistent comparison with the crash event period in hospital discharge data.

Joining Crash Events Among Injured Individuals in Either Source

After identifying crash-involved individuals and subsequently identifying those with injuries, we jointly examined injured individuals in the two data sources to determine the same crash event in each source. We characterized crash events identified through hospital data and crash reports as being related to the same crash for the same individual if the 3-day crash event period from the hospital data began on the same day or up to two days after the initial crash date from the crash report’s 3-day event period. These represent crash-involved individuals with a documented injury in both the hospital data and on the crash report (Both data sources [B]). Injured individuals with either 1) crash events in the hospital data that occurred prior to or three or more days after an initial crash date in the crash report data or 2) who had no crash report were considered injured only in the hospital data (Hospital discharge data only [H]). Similarly, individuals with crash reports that did not join with a crash event in the hospital data within three days of the crash report date were considered injured in the crash reports only (Crash report data only [C]).

Concordance Analyses

We assessed the degree of agreement—or concordance—between crash reports and hospital data in injury severity and road user type. For those with joined crash events (B), we explored 1) whether road user type as documented in the hospital data was the same as that documented in the crash report data and 2) how injuries at each severity level in hospital data were classified in crash reports. We categorized higher severity injuries as being AIS 3+ in the hospital data and as incapacitated or killed in the crash report data; other injuries were considered lower severity (Zonfrillo et al. 2015). We conducted these assessments to better understand consistency in documentation between the two sources. Further, we compared the distribution of age group (0-7, 8-16, 17-34, 35-64, 65+), race/ethnicity, sex, road user type, and injury severity (higher, lower, unmappable) among those injured individuals identified through hospital and crash report data using chi-square statistics.

Share of Injured Population Documented by Each Source

We calculated the extent to which a single source captured the total injured population, which was the sum of the number of injured individuals identified only through hospital data, the number identified only through crash reports, and the number identified through joined crash reports and hospital data (Total injured population = H + C + B) (Elvik & Mysen 1999). We calculated the share of the injured population identified via hospital data as (H + B)/(H + C + B) and the share of the injured population identified via crash reports as (C + B)/(H + C + B).

After analyzing injuries overall, we stratified by injury severity and road user type identified by each source. We first classified severity and road user type in each source independently—particularly to understand differences in the population captured by each source alone (i.e., H + B or C + B). For the entire injured population (H + C + B), we harmonized data according to the following: if the injury severity differed between the crash report and hospital data, we assigned the value from the hospital data, assuming that the hospital data is more accurate (Burch et al. 2014). If the road user type indicated by the two sources differed, we assigned the value from the crash report to the individual, assuming that the responding officer would provide more accurate information. This allowed us to understand, for example, what proportion of all higher severity injuries or all drivers were captured by crash report data versus hospital data. Finally, we examined the share of records in each source stratified by age group, race/ethnicity, and sex.

This study was approved by the Children’s Hospital of Philadelphia’s Institutional Review Board (IRB 11-008136). Analyses were conducted using SAS software version 9.4 (SAS Institute Inc., Cary, NC, USA).

RESULTS

Study Cohort Selection

Overall, there were 183,537 crash-involved individuals in the hospital data during the study period, as identified by an MVT-code; of these, 168,874 had an injury (H + B) (Figure S1). There were 1,188,959 crash-involved individuals documented in crash report; of these, 164,158 had an injury (C + B). After joining injured individuals in the two sources, we identified 262,338 crash-involved individuals with a documented injury in the hospital data alone, the crash report data alone, or in both data sources for a given crash event (H + C + B).

Demographics and characteristics

Table 1 presents the distribution of demographic characteristics for those with crash injuries. Although most injured individuals were 17-34 years old (74.1% of those in hospital data and 73.2% of those in crash report data), a slightly larger proportion of the injured individuals in the hospital data (H+B) were 65+ (5.7%) than in the crash report data (C+B) (3.7%). The distribution of race/ethnicity also differed between the two sources: when comparing hospital data with crash report data, a larger proportion of injured individuals were Black (23.3% vs. 17.7%) or Hispanic (30.3% vs. 26.0%); conversely, a larger proportion of those in the crash report data were White (46.0%) than in the hospital data (36.7%). About half of injured individuals in both the hospital data and crash report data were female (54.1% vs. 53.0%, respectively) (Table S3).

Table 1.

Distribution of demographic characteristics (N (%)) among injured individuals involved in a crash, identified through hospital discharge data and crash report data, New Jersey, 2016-2017

Hospital discharge data (H + B),
N = 168,874
Crash report data (C+B),
N = 164,158
p-value
Age 1 0-7 3,079 (1.8) 2,998 (1.8) 0.95
8-16 14,818 (8.8) 13,074 (8.0) <0.001
17-34 125,106 (74.1) 120,068 (73.2) <0.001
35-64 16,169 (9.6) 17,662 (10.8) <0.001
65+ 9,702 (5.7) 6,027 (3.7) <0.001
Race/ethnicity 2 Black 39,348 (23.3) 29,056 (17.7) <0.001
Hispanic 51,169 (30.3) 42,681 (26.0) <0.001
White 61,977 (36.7) 75,513 (46.0) <0.001
Other 16,381 (9.7) 16,744 (10.2) <0.001
1

4,329 (2.6%) of those in crash report data had unknown age.

2

If an individual did not have a known race/ethnicity value in the NJ-SHO warehouse, we applied their probabilities for each category from the BISG algorithm. We present estimated counts.

Table 2 presents the distribution of injury severity and road user type. Few injuries were higher severity: 2.5% of those in the hospital data, and 2.0% in the crash report data. Most injuries were lower severity: 77.7% in hospital data, with an additional 19.8% unmappable to AIS (note that 30,873 individuals were identified by pain-related ICD-10-CM codes alone that do not map to AIS), and 98.0% in crash report data. Though only 46.5% of those in hospital data were documented as drivers, 69.7% were drivers in crash report data. Additionally, a larger proportion of those in hospital data were bicyclists or motorcyclists than those in crash report data: 3.0% versus 1.8% for bicyclists, and 2.8% versus 1.9% for motorcyclists. Nearly one-quarter (24.4%) of those in hospital data were unspecified/other types of road user.

Table 2.

Distribution of injury severity and road user type (N (%)) among injured individuals involved in a crash, identified through hospital discharge data and crash report data, New Jersey, 2016-2017

Hospital discharge data (H + B),
N = 168,874
Crash report data (C+B),
N = 164,158
p-value
Severity 1 Higher severity 4,156 (2.5) 3,305 (2.0) <0.001
Lower severity 131,287 (77.7) 160,853 (98.0) <0.001
Unmappable to AIS 2 33,431 (19.8) - -
Road User Driver 78,545 (46.5) 114,491 (69.7) <0.001
Passenger 30,396 (18.0) 35,110 (21.4) <0.001
Pedestrian 8,617 (5.1) 8,458 (5.2) 0.52
Bicyclist 5,141 (3.0) 2,988 (1.8) <0.001
Motorcyclist 4,754 (2.8) 3,111 (1.9) <0.001
Unspecified/Other 41,421 (24.4) - -
1

Higher severity injuries were classified as AIS 3+ in the hospital discharge data and as incapacitated or killed in the crash report data. AIS <3 injuries in the hospital discharge data and moderate injury or complaint of pain in the crash report data were considered lower severity.

2

30,873 injured individuals were identified using pain-related codes alone; these are unmappable to AIS.

Concordance in Injury Severity and Road User Type

Table 3 presents concordance in injury severity among the injured individuals with joined crash events in hospital data and crash report data (B, N = 70,694). Approximately three-fourths of injured individuals with AIS 3+ injuries in the hospital data joined with a crash report indicating that they were incapacitated/killed (30.5%) or had a moderate injury (44.4%). Most of those with AIS <3 injuries joined with a crash report that had a complaint of pain (80.4%) as the most severe injury. Though 11,955 individuals in both sources were unmappable to AIS in the hospital data, most of these unmappable values were pain-related ICD-10-CM codes (11,748 among those with joined crash events); the majority of these unmappable injuries joined to a complaint of pain in the crash report data (93.7%).

Table 3.

Concordance in injury severity as documented by hospital discharge data and crash report data (N (%)), among those with joined crash events (N = 70,694)

Injury severity in hospital discharge
data
Injury severity in crash report data Totals
Incapacitated/killed Moderate injury Complaint of pain
AIS 3+ 719 (30.5) 1,048 (44.4) 593 (25.1) 2,360
AIS < 3 707 (1.3) 10,327 (18.3) 45,345 (80.4) 56,379
Unmappable to AIS 1 46 (0.4) 710 (5.9) 11,199 (93.7) 11,955
1

11,748 unmappable values for those with a joined crash event were pain-related codes in the hospital discharge data.

Table 4 presents concordance in road user type among the injured individuals with joined crash events in hospital discharge data and crash report data. The majority of those with documented road user type in the hospital data were the same road user type as in crash report data. For example, 91.2% of pedestrians in crash report data were documented as pedestrians in hospital discharge data. However, some injured individuals in the crash report data were documented as unspecified/other road user in the hospital data, particularly drivers (21.9%) and passengers (23.3%).

Table 4.

Concordance in road user type as documented by hospital discharge data and crash report data (N (%)), among those with joined crash events (N = 70,694)

Road user type in crash report data Road user type in hospital discharge data Totals
Same road user type Different road user type Unspecified/other road user type
Driver 39,657 (76.8) 669 (1.3) 11,291 (21.9) 51,617
Passenger 9,599 (70.9) 791 (5.8) 3,154 (23.3) 13,544
Pedestrian 2,525 (91.2) 59 (2.1) 185 (6.7) 2,769
Bicyclist 816 (79.0) 167 (16.1) 50 (4.8) 1,033
Motorcyclist 1,610 (93.0) 49 (2.8) 72 (4.2) 1,731

Share of Records Documented by Each Source

After identifying individuals with crash-related injuries according to each data source, including those identified as injured in both sources, we examined the share of records captured—and conversely missed—by each source. Overall, 64.4% of injuries were captured by hospital discharge data and 62.6% by crash report data. Table 5 presents the share of records captured by each source stratified by demographic characteristics. There were differences by age and race/ethnicity. This was particularly apparent for those ages 65+ years: hospital data captured a larger share of injuries among those ages 65+ (72.9%) than crash report data (45.3%). Additionally, although hospital data captured 71.8% of Black individuals and 69.2% of Hispanic individuals, crash report data only captured 53.0% of Black individuals and 57.7% of Hispanic individuals. However, crash report data and hospital data captured similar proportions of males and females (Table S4).

Table 5.

Share of records captured by each source (N (%)), stratified by demographics

Hospital discharge data (H
+ B, N = 168,874; 64.4%)1
Crash report data (C+B,
N = 164,158; 62.6%)1
Entire population
(H+C+B, N = 262,338)
Age 2 0-7 3,079 (57.3) 2,998 (55.8) 5,376
8-16 14,818 (68.6) 13,074 (60.6) 21,590
17-34 125,106 (65.2) 120,068 (62.6) 191,935
35-64 16,169 (62.7) 17,662 (68.5) 25,799
65+ 9,702 (72.9) 6,027 (45.3) 13,309
Race/ethnicity 3 Black 39,348 (71.8) 29,056 (53.0) 54,829
Hispanic 51,169 (69.2) 42,681 (57.7) 73,979
White 61,977 (57.8) 75,513 (70.4) 107,296
Other 16,381 (62.4) 16,744 (63.8) 26,234
1

Percentages in H+B are calculated using H + B/ (H + C + B) and percentages in C+B are calculated using C + B/(H + C + B).

2

4,329 (2.6%) of those in crash report data had unknown age.

3

If an individual did not have a known race/ethnicity value in the NJ-SHO warehouse, we applied their probabilities for each category from the BISG algorithm. We present estimated counts.

Table 6 presents the share of records documented by each source stratified by injury severity and road user type. While hospital data captured a higher proportion of higher severity injuries (69.4%) compared with crash report data (55.2%), crash report data captured 72.2% of lower severity injuries compared with only 58.9% captured by hospital data. Additionally, crash report data captured a larger proportion of drivers (72.8%) than hospital data (49.9%); the same was seen for passengers (62.1% versus 53.8%). However, hospital discharge data captured a larger share of bicyclists and motorcyclists than crash report data: hospital data captured 70.6% of bicyclists and 79.0% of motorcyclists, while crash report data captured 41.0% of bicyclists and 51.7% of motorcyclists.

Table 6.

Share of records captured by each source (N (%)), stratified by injury severity and road user type

Hospital discharge data (H
+ B, N = 168,874; 64.4%)1
Crash report data (C+B,
N = 164,158; 62.6%)1
Entire population
(H+C+B, N = 262,338)
Severity 2 Higher severity 4,156 (69.4) 3,305 (55.2) 5,989
Lower severity 131,287 (58.9) 160,853 (72.2) 222,918
Unmappable to AIS 33,431 (100) - 33,431
Road User 3 Driver 78,545 (49.9) 114,491 (72.8) 157,332
Passenger 30,396 (53.8) 35,110 (62.1) 56,501
Pedestrian 8,617 (59.9) 8,458 (58.8) 14,393
Bicyclist 5,141 (70.6) 2,988 (41.0) 7,287
Motorcyclist 4,754 (79.0) 3,111 (51.7) 6,014
Unspecified/Other 41,421 (>100)4 - 20,811
1

Percentages in H+B are calculated using H + B/(H + C + B) and percentages in C+B are calculated using C + B/(H + C + B).

2

Column H + B uses severity from hospital discharge data only, and column C + B uses severity from crash report data only. For the H + C + B population, severity is taken from hospital discharge data if the individual was in that source.

3

Column H + B uses road user type from hospital discharge data only, and column C + B uses road user type from crash report data only. For the H + C + B population, road user type is taken from crash report data if the individual was in that source.

4

Since road user type as documented in the crash report data is prioritized for the H + C + B population, some individuals who are unspecified/other in the hospital discharge data become a specified road user type in the overall population. This leads to a percentage greater than 100 here.

DISCUSSION

We examined both hospital data and crash report data to (1) characterize the share of crash-injured individuals reported in each and (2) determine the extent to which specific populations were identified as injured in each source to better understand potential biases introduced by using one source alone. We found that hospital data alone and crash data alone each captured approximately two-thirds of all injuries. Hospital data captured a larger proportion of higher severity injuries than crash reports; conversely, crash reports captured a larger proportion of lower severity injuries. Further, we noted that a larger share of those ages 65+ or those who were Black or Hispanic were captured by hospital data than by crash report data. Importantly, hospital data may not adequately characterize road user type, as nearly one-quarter of injured individuals identified in hospital data could not be classified into a known type. When joined with crash report data, most of these same individuals were recorded as drivers or passengers on the crash report. Conversely, both bicyclists and motorcyclists appear to be underrepresented in crash reports compared with hospital discharges. Thus, an analysis relying on one of these sources alone not only undercounts crash injuries, but also underrepresents these specific groups.

These findings underline the strengths and weaknesses of crash report and hospital discharge data, corroborating the need for future data linkage efforts to appropriately classify the true magnitude of crash injuries and inform the development of mitigation efforts. Specifically, our results suggest that researchers and practitioners risk missing over one-third of crash-involved individuals when they solely analyze one source of data. Differences in who is represented in each data source may arise from the source’s documentation approaches or varying data analysis-, individual-, and systemic-level factors. For example, since “complaint of pain” is considered an injury on crash reports as per KABCO, our inclusion of pain-related ICD-10-CM codes in hospital discharge data allowed us to identify an additional 30,873 individuals who may have been missed using only injury-related codes. Further, previous research has documented that factors like road user type, injury severity, and sociodemographic factors may alter the setting and way in which crash injuries are documented or treated, corroborated by our results.

This study has several limitations. First, for injury-related hospital diagnosis codes, some individuals may not have crash-related external cause of injury codes due to oversights or errors in medical documentation. Additionally, for individuals who did not link to an event in the crash report data and who had multiple crash-related hospital records on different dates within a short period of time, it is difficult to ascertain whether these are associated with the same crash event; either that individual had multiple visits within three days for the same crash event or multiple crash events within the same three days. Finally, there may be individuals who crashed outside of NJ (e.g., in Pennsylvania or New York), who were transported to a hospital in NJ; conversely, some who crashed in NJ may have received care at a hospital outside the state; this is a limitation of any statewide linked data sources.

Our findings suggest future crash injury-focused analyses should attempt to examine multiple sources of data, particularly through data linkages. Examining linked data allows researchers to leverage each source of data’s strengths, thereby offsetting their limitations and tempering potential sources of bias in results. Overcoming these types of biases and limitations is exceptionally important in informing the development and allocation of crash injury mitigation efforts.

Supplementary Material

Supp 1

ACKNOWLEDGEMENTS

We are grateful to Mark R. Zonfrillo, MD, MSCE for generating the AIS mapping matrix that we used to classify injuries in hospital discharge data, as well as providing guidance on implementing the matrix. We would like to thank Emma Sartin, PhD, MPH for her assistance with manuscript revisions. We also would like to thank Adrian Diogo, MPH, Elizabeth Borkowski, BA, and Nicole Caputo for their contributions to this project.

This work was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (R21HD092850 and R21HD098276; Principal Investigator: Allison E. Curry). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The sponsor had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Footnotes

CONFLICTS OF INTEREST

The authors have no conflicts of interest to disclose.

DATA AVAILABILITY

The data that support the findings of this study are available from the NJ Motor Vehicle Commission (MVC), NJ Department of Transportation, and NJ Department of Health. Restrictions apply to the availability of these data, which were used under a Memorandum of Agreement and a Data Use Agreement for this study. Data may be available from the authors with the permission of these NJ governmental agencies, a Collaborative Research Agreement with the authors, and with certain restrictions.

REFERENCES

  1. Association for the Advancement of Automotive Medicine (AAAM). The Abbreviated Injury Scale (AIS) 2005 – Update 2008. Association for the Advancement of Automotive Medicine. 2008. [PMC free article] [PubMed] [Google Scholar]
  2. Burch CA, Cook L, Dischinger PC. A comparison of KABCO and AIS injury severity metrics using codes linked data. Traffic Inj Prev. 2014;15(6), 627–630. [DOI] [PubMed] [Google Scholar]
  3. Conderino S, Fung L, Sedlar S, Norton JM. Linkage of traffic crash and hospitalization records with limited identifiers for enhanced public health surveillance. Accid Anal and Prev. 2017;101, 117–123. [DOI] [PubMed] [Google Scholar]
  4. Curry AE, Pfeiffer MR, Metzger KB, Carey ME, Cook LJ. Development of the integrated New Jersey Safety and Health Outcomes (NJ-SHO) data warehouse: catalysing advancements in injury prevention research. Inj Prev. 2021;27(5):472–478. [DOI] [PubMed] [Google Scholar]
  5. Davis E, Whyde A, & Langton L. Contacts Between Police and the Public, 2015. U.S. Department of Justice Special Report. 2018. [Google Scholar]
  6. Elvik R, Mysen AB. Incomplete Accident Reporting: Meta-Analysis of Studies Made in 13 Countries. Transp Res Rec. 1999;1665(99), 133–140. [Google Scholar]
  7. Glerum KM, Zonfrillo MR. Validation of an ICD-9-CM and ICD-10-CM map to AIS 2005 Update 2008. Inj Prev. 2019;25(2), 90–92. [DOI] [PubMed] [Google Scholar]
  8. Hamann C, Peek-Asa C, Butcher B. Racial disparities in pedestrian-related injury hospitalizations in the United States. BMC Public Health. 2020;20(1), 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hedegaard H, Johnson RL, Thomas KE. The International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) External Cause-of-injury Framework for Categorizing Mechanism and Intent of Injury. Natl Health Stat Report. 2019;(Vol. 136). https://www.cdc.gov/nchs/products/index.htm. [PubMed] [Google Scholar]
  10. Injury Surveillance Workgroup. The Transition from ICD-9-CM to ICD-10-CM Guidance for Analysis and Reporting of Injuries by Mechanism and Intent. 2016.
  11. Kamaluddin NA, Abd Rahman MF, Várhelyi A. Matching of police and hospital road crash casualty records – a data-linkage study in Malaysia. Int J Inj Contr Saf Promot. 2019;26(1), 52–59. [DOI] [PubMed] [Google Scholar]
  12. Labgold K, Hamid S, Shah S, et al. Estimating the Unknown: Greater Racial and Ethnic Disparities in COVID-19 Burden After Accounting for Missing Race and Ethnicity Data. Epidemiology. 2021;32(2):157–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Loftis KL, Price JP, Gillich PJ, Cookman KJ, Brammer AL, St Germain T, Barnes J, Graymire V, Nayduch DA, Read-Allsopp C, Baus K, Stanley PA, Brennan M. Development of an expert based ICD-9-CM and ICD-10-CM map to AIS 2005 update 2008. Traffic Inj Prev. 2016;17 Suppl 1, 1–5. [DOI] [PubMed] [Google Scholar]
  14. National Highway Traffic Safety Administration (NHTSA). MMUCC Guideline Model Minimum Uniform Crash Criteria Fourth Edition. 2012. Available at https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811631
  15. Perkins NJ, Cole SR, Harel O, et al. Principled Approaches to Missing Data in Epidemiologic Studies. Am J Epidemiol. 2018;187(3):568–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Sartin EB, Metzger KB, Pfeiffer MR, Myers RK, Curry AE. Facilitating research on racial and ethnic disparities and inequities in transportation: Application and evaluation of the Bayesian Improved Surname Geocoding (BISG) algorithm. Traffic Inj Prev. 2021;22(sup1):S32–S37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. State of New Jersey Motor Vehicle Commission. New Jersey NJTR-1 Crash Report Manual. 2017. https://www.state.nj.us/transportation/refdata/accident/pdf/NJTR-1CrashReportManual12517.pdf.
  18. U.S. Department of Transportation Federal Highway Administration. KABCO Injury Classification Scale and Definitions. 2016. https://safety.fhwa.dot.gov/hsip/spm/conversion_tbl/pdfs/kabco_ctable_by_state.pdf
  19. Watson A, Watson B, Vallmuur K. Estimating under-reporting of road crash injuries to police using multiple linked data collections. Accid Anal and Prev. 2015; 83, 18–25. [DOI] [PubMed] [Google Scholar]
  20. Zonfrillo MR, Weaver AA, Gillich PJ, Price JP, Stitzel JD. New methodology for an expert-designed map from international classification of diseases (ICD) to abbreviated injury scale (AIS) 3+ severity injury. Traffic Inj Prev. 2015;16 Suppl 2, S197–200. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

Data Availability Statement

The data that support the findings of this study are available from the NJ Motor Vehicle Commission (MVC), NJ Department of Transportation, and NJ Department of Health. Restrictions apply to the availability of these data, which were used under a Memorandum of Agreement and a Data Use Agreement for this study. Data may be available from the authors with the permission of these NJ governmental agencies, a Collaborative Research Agreement with the authors, and with certain restrictions.

RESOURCES