Skip to main content
Sports Health logoLink to Sports Health
. 2019 Jul 2;11(5):440–445. doi: 10.1177/1941738119854759

Lessons on Data Collection and Curation From the NFL Injury Surveillance Program

Nancy A Dreyer †,*, Christina D Mack , Robert B Anderson §,, Edward M Wojtys , Elliott B Hershman #, Allen Sills **
PMCID: PMC6745812  PMID: 31265352

Abstract

Background:

“Research-ready” evidence platforms that link sports data with anonymized electronic health records (EHRs) or other data are important tools for evaluating injury occurrence in response to changes in games, training, rules, and other factors. While there is agreement that high-quality data are essential, there is little evidence to guide data curation.

Hypothesis:

We hypothesized that an EHR used in the course of clinical care and curated for research readiness can provide a robust evidence platform. Our purpose was to describe the data curation used for active injury surveillance by the National Football League (NFL).

Study Design:

Dynamic cohort study.

Level of Evidence:

Level 2.

Methods:

Players provide informed consent for research activities through the collective bargaining process. A league-wide EHR is used to record injuries that come to the attention of the teams’ athletic trainers and physicians, NFL medical spotters, or unaffiliated neurotrauma consultants. Information about football activities and injuries are linkable by player, setting, and event to other sports-related data, including game statistics and game-day stadium quality measures, using a unique player identification designed to protect player privacy. Ongoing data curation is used to review data completeness and accuracy and is adjusted over time in response to findings.

Results:

The core data curation activities include monthly injury summaries to team staff, queries to resolve incomplete reporting, and periodic external checks. Experiences derived from producing more than 100 reports per year on diverse topics are used to update coding training and related guidance documents in response to missing data or inconsistent coding that is observed. Roughly 20% more injuries were recorded for the same “reportable” injuries after switching from targeted reporting to an EHR.

Conclusion:

Research-ready databases need systematic curation for quality and completeness, along with related action plans. More injuries were reported through EHR than through targeted reporting.

Clinical Relevance:

Evidence-driven decision-making thrives on reliable data fine-tuned through systematic use, review, and ongoing adjustments to the curation process.

Keywords: football, data curation, epidemiology surveillance, electronic health records


Electronic health records (EHRs) are useful as a foundation for “real-world” (RW) health research and can be enhanced through linkage with other data and maintained as an active “evidence platform” available for queries.15 The term RW research has recently come into widespread use by regulators, biopharmaceutical and medical device manufacturers, and clinicians and is broadly used to refer to information on health care that is derived from nonexperimental settings, including EHR, claims and billing data, patient registries, and data gathered through personal devices, sensors, and health applications.17 RW data are often assembled into a platform as a collection of related sources, where data are linked and then used as the basis for research and surveillance (an “evidence platform”). Examples include the United States Bone and Joint Initiative’s proposal to reduce the burden of musculoskeletal disease10 and Food and Drug Administration initiatives for health care decision making for benefit-risk assessments.1-3,6 In the absence of established guidance, the big challenge for EHR-enabled research—whether to study sports injuries or a broader clinical context—is assuring its reliability for a given purpose.4,14

Here we describe the data collection, data curation, quality improvement, and analytic processes used to study health and safety in the National Football League’s (NFL’s) active players using an EHR linked with a number of relevant data sources.

Methods

This evidence platform facilitates integration of customized EHR data for players with game statistics, roster information, game-day conditions, and other information, with data on more than 8000 NFL players and more than 42,000 injuries over the past decade. While each NFL team has access to its players’ medical data, players also provide informed consent and agree to share data according to a process established through a collective bargaining agreement. The governance process facilitates submission of medical research questions, review and approval, and subsequent execution of analyses after an approval process that includes protocol review and approval by the Mt Sinai Institutional Review Board.

Release of the data is governed by the Medical Research Application Process (MRAP), a joint agreement between the NFL and the NFLPA (NFL Players Association), and any research using the data must go through the MRAP. Questions are posed regularly by the League as well as players, owners, general managers, Certified Athletic Trainers (ATCs), team physicians, and medical staff and by a network of medical professionals working with the NFL—a concept that has been referred to as the “University of the NFL.”13 Standing committees are organized in a variety of medical areas, which facilitates cross-specialization discussion among subject matter experts, team physicians, and ATCs. Data curation and most analytics are conducted by a third party, IQVIA, a human data science company under contract to the NFL, with daily uploads of EHR data.

Data Collection

The NFL has been conducting systematic surveillance for injuries sustained by players for more than 30 years.16 In the early years of injury surveillance, data collection focused on voluntary reporting of concussions, fractures, heat-related illness, methicillin-resistant Staphylococcus aureus infections, and any injury that required medical intervention or time loss from a practice or game. Over time, the injury reporting system became mandatory. The increased emphasis on reporting was accompanied by changes in injury coding using a customized system of diagnosis terms relevant specifically to NFL player illnesses and injuries. Clinical grade, partial versus complete tear, and other markers of injury severity are collected when it is possible for medical staff to evaluate and assign in an accurate and standardized way.

The program was substantially upgraded in 2014 when a League-wide initiative, in partnership with the NFLPA, was launched to capture injury and treatment information through an EHR system specially adapted for use in sports, allowing for a more comprehensive examination of injury occurrence. The focus was broadened to reporting all complaints evaluated or treated by medical staff, regardless of lost time, throughout the full year, including the offseason. In 2015, surveillance broadened to a more clinical focus, requiring reporting of all injury complaints evaluated or treated by medical staff (Figure 1).

Figure 1.

Figure 1.

Evolution of data collection for injury surveillance in the National Football League (NFL). MRSA, methicillin-resistant Staphylococcus aureus.

ATCs and team physicians enter medical information into the EHR year-round, reflecting all injuries along with clinical and rehabilitation care provided to active players in the NFL across the full roster (53 athletes per team in the regular season, with 90 in preseason). Unique numeric player identifiers are linkable to the NFL Game Statistics and Information System, reports from field medical personnel such as Unaffiliated Neurotrauma Consultants (independent neurotrauma-specific physicians who provide additional medical consultation for in-game concussion evaluations), and data from the NFL Game Day Surface Task force. EHR data are transferred daily to a secure environment, from which more than 3000 distinct tables of unlinked raw data are examined for information that may be relevant to a given research question. Analytic data sets are then assembled by processing, linking, extracting, and creating operational definitions for each data element. Information about injuries experienced during conditioning, practice activities, and games is collected year-round and includes anatomic location of the injury, physical findings and results of diagnostic testing, as well as how much time, if any, a player misses from participation in football activities (ie, practices and games) (Table 1).

Table 1.

Selected data elements used for NFL health and safety evaluation

Injury Details a Treatment and Outcome Information a Game Day Information b
• Onset date
• Clinical impression code
• Clinical description of injury
• Laterality
• Injury mechanism c
• Nature of injury
At time of injury:
• Team activity c
• Player activity
• Player position
• Quarter of game
• Date of removal from participation
• Date of return to full participation
• Games missed
• Practices missed
• Surgery status
• Surgery CPT code
• Imaging
Player information
• Age
• Height/weight/BMI
• Injury history in NFL
• Game ID
• Date/time
• Day of week
• NFL week/season
• Location
• Stadium type
• Home/Away teams
• Surface type
• Weather conditions
• Total number of plays
• Play type and outcome
Surface Quality
• Specific surface type
• Field hardness
• Infill level

BMI, body mass index; CPT, Current Procedural Terminology; NFL, National Football League.

a

NFL Injury Surveillance System, NFL electronic health record.

b

Game day data from stadium and local weather are provided through the NFL Game Statistics and Information System.

c

Injury mechanism as reported by the Athletic Trainer (ATC) and medical staff through sideline assessments; for example, contact, type of contact, and point of impact with opponent’s body, if any. Team activity consists of blocking, tackling, and so on. These fields can be limited and, as needed, assessments are refined by video review.

In addition to clinical information typically collected within an EHR, sport-specific information is obtained, including setting-related factors such as play type, player position, contact type, impact source, and football activity during injury (Table 1). These clinical and situational data in the EHR are linked based on a game identifier to NFL game data, including game location, stadium and field surface type, play counts by type and outcome of the play, and other key factors related to the game situation (Figure 2). For concussion studies, game-day evaluation reports from unaffiliated neurotrauma consultants present on the field are linked to EHR data for ascertainment of final diagnosis, and rigorous expert third-party video review is performed to confirm the point of impact for concussive events.11

Figure 2.

Figure 2.

National Football League (NFL) evidence platform for player health and safety. ATC, certified athletic trainer; UNC, unaffiliated neurotrauma consultant.

Data Curation

Curation is composed of a variety of quality control checks, including review of targeted injuries coupled with regular team ATC follow-up to assure completeness of reporting. Data quality reports customized for each of the 32 clubs are issued to team medical staff monthly during the season. Periodic trainings of the ATC staff are used to standardize data entry and draw attention to the importance of consistent and accurate data entry by highlighting specific areas that require completion or augmentation, as well as collecting feedback on difficulties in data entry that may be modifiable. Data entry guidance documents are issued periodically to provide practical references for ATCs and give structure as to what and where data should be entered. External sources including public media reports are also consulted periodically to ensure complete reporting. Discrepancies between club-reported data and media reported injuries, along with any other potential errors or important missing data found from any of these reviews, are queried directly to the club medical staff and corrected as needed. Results describing injury trends, setting, and severity are provided to team medical staff and report club data compared with League-wide incidence.

Results

Looking only at injuries that would have been reported following the “reportable” definition in use from 2011 through 2014, the change from targeted electronic data collection to data collection through an EHR resulted in an increase of approximately 20% more injuries during practices and games in-season (average of 2845 events/year for 2011-2013 vs 3400/year for 2014-2017) and an even greater increase when all injuries were included. For example, in 2017, 7154 injuries were reported during the preseason and regular season, with 3400 of them resulting in time missed from a practice or a game.

External checks were conducted periodically within a group of select sources to identify public reports of concussions, injuries to the neck and spine, knee injuries, Achilles injuries, fractures, and surgeries. While team medical staff are the trusted source for diagnosis of an injury, reviewing and discussing external reports with teams is a way to audit completeness of injury reports. As with all data curation activities, NFL team medical staff were queried directly about any apparent discrepancies, with corrections made if appropriate. In the 2017 season, the vast majority of publicly reported NFL injuries identified from these public reports were found to be accurately recorded in the EHR by team medical staff during these audits; only 15 injuries (<2%) required correction or augmentation.

Discussion

These data are being used to address a variety of questions as they arise, with information provided to the league as it focuses on prevention efforts. Examples include an examination of the association between stadium surfaces—artificial turf versus natural grass—and injury occurrence, a critical question for NFL athletes and clubs.9,12 Analyses found higher rates of lower extremity injuries on artificial surfaces. Examination of the unaffiliated neurotrauma consultant and athletic trainer spotter programs for concussion detection and epidemiologic descriptions of high-impact injuries are also in progress.

A carefully crafted and research-driven approach to data extraction and curation is needed because of the sheer volume of EHR data, as well as their unstructured nature that allows information to be recorded in many different locations. Like any good evidence platform, the utility of this program depends in large measure on the ability to easily record, identify, extract, link, and utilize accurate and relevant data on demand. Nonetheless, like any good research, data curation needs to focus on the core data elements: accurate and complete injury reporting during conditioning, practices, and games, and consistently accurate linkage to other football-related exposures.

At a practical level, the 3 major challenges of using EHR for research are (1) large amounts of data needed for research that are recorded in an unstructured format and often in locations that are not generally accessible to researchers (eg, “notes”); (2) inconsistent recording of information; and (3) lack of data essential to a particular investigation (missing data).5,7,8

Like data from EHR in typical clinical practice settings, information on players’ injuries is recorded by those providing care. Unlike research conducted in typical RW clinical settings, however, these data are collected in a relatively controlled setting using a common EHR system tailored to orthopaedic and neurologic issues and internal medicine. Data on injuries are entered by a relatively small and highly trained team, and the data undergo systematic quality assurance review. A data dictionary is maintained that contains operational definitions in situations where conventional coding schemes are not used, and medical and training staff undergo periodic training about data entry.

Solutions that can be implemented in a closed system such as the NFL include improvements to the technical interface to reduce data entry burden, more comprehensive capture of missed time in games and practices, provision of additional guidance through a team of ATCs, and external quality control by comparison of EHR data with media and other injury reports. Nonetheless, there are many injuries, and entering all requisite data is time-consuming, much like at any medical facility; additionally, diligence in reporting can be variable across providers and clubs. Also, researchers need to be mindful about technical system changes that are likely to be made periodically to commercially available EHR, which may affect data completeness, accessibility, or transfers. Quantifying injury severity in a standard manner across teams can be difficult.

Interpreting the meaning of changes over time from these data needs to account for concurrent changes in reporting, resulting from a more comprehensive data collection program as well as from temporal changes in awareness that lead to increased reporting. A good understanding of injury risk also should account for sport-specific temporal changes such as rules and roster size, variations in the number of practices and games per season, and variation in the number of plays per game over time, all of which may affect a player’s on-field “exposure” to the chance of being injured. In the case of the NFL, the movement from a classic electronic surveillance system to collection of all events through a clinical EHR has increased reporting of injuries, particularly less severe ones. As with any data source, it is important to be aware of changes in data collection processes and to consider how these may affect interpretation.

Taken as a whole, an EHR-based evidence platform that is research-ready and sufficiently reliable to guide practical decision making needs systematic data curation and reporting. Data curation is best shaped by experience with analyses and actions; it generally includes ongoing data review and augmentation, reporter training, and quality improvement programs. Here the combination of data review and feedback, regular trainings, ATC interviews, and development and use of guidance documents are essential components for maintaining a high-quality system that is robust enough to support decision making. These activities are supported by a team consisting of epidemiologists, data scientists, biostatisticians, and data architects that are familiar with the contents of the database and the methods needed to use it.

A key strength of this program is the ability to study a complete and clearly defined population at risk while capturing detailed information on player health that supports many different research initiatives, which then also informs data curation activities. Together with numerous data linkages and quality efforts, the program provides a strong research tool for understanding trends and for both explanatory and predictive studies that can broadly contribute to the body of scientific evidence for sports medicine.

Pragmatic RW evidence programs such as these require full understanding of the processes involved in data entry and attention to maintaining quality for key outcomes and data of interest. As we advance our culture of evidence-driven decision making, we will see more calls for strong platforms to enable applied research, and the lessons learned from the NFL may benefit other sports programs and health care overall. The ultimate test, however, is how well these data and analytics inform policy and practices that lead to improvements in player health and safety.

Acknowledgments

The authors wish to acknowledge the contribution of the IQVIA Analytics Team composed of Kristina Zeidler (data collection, data quality, and analysis), Kristin Shiue (data quality and analysis), Randell Grenier (data curation), Mackenzie Herzog (analysis), and Rachel Sendor (data collection and analysis), as well as the thoughtful editing suggestions provided by Dr Gary Solomon, Senior Advisor to the NFL Department of Health and Safety.

Footnotes

The following authors declared potential conflicts of interest: N.A.D. and C.D.M. are employed by IQVIA; R.B.A. received consulting fees from Amniox, Artelon, Arthrex, Bioventus, DJO, Wright Medical, and Zimmer Biomet, and royalties from Arthrex, Biomet, Wright Medical, and Zimmer Biomet; E.M.W. received travel payments from the National Football League (NFL) and AOSSM and payments from the NFL and NIAMS/KAI, he also serves as editor in chief for Sports Health; E.B.H. received consulting fees and royalties from Active Implants Corp and royalties from Zimmer Biomet; and A.S. is employed by the NFL. This research program was funded by the NFL and approved by the National Football League Players Association under a collectively bargained agreement pertaining to NFL player research. The authors assert that these results are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation.

References

  • 1. Califf RM. Benefit-risk assessments at the US Food and Drug Administration: finding the balance. JAMA. 2017;317:693-694. [DOI] [PubMed] [Google Scholar]
  • 2. Califf RM, Robb MA, Bindman AB, et al. Transforming evidence generation to support health and health care decisions. N Engl J Med. 2016;375:2395-2400. [DOI] [PubMed] [Google Scholar]
  • 3. Center for Devices and Radiological Health, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration. Use of real-world evidence to support regulatory decision-making for medical devices. Guidance for industry and Food and Drug Administration staff. US Food and Drug Administration; August, 2017. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/use-real-world-evidence-support-regulatory-decision-making-medical-devices. Accessed June 16, 2019.
  • 4. Daniel G, Silcox C, Bryan J, et al. White paper: Characterizing RWD Quality and Relevancy for Regulatory Purposes. Published October 1, 2018. https://healthpolicy.duke.edu/sites/default/files/atoms/files/characterizing_rwd.pdf. Accessed June 16, 2019.
  • 5. Dreyer NA. Advancing a framework for regulatory use of real-world evidence: when real is reliable. Ther Innov Regul Sci. 2018;52:362-368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Dreyer NA, Rodriguez AM. The fast route to evidence development for value in healthcare. Curr Med Res Opin. 2016;32:1697-1700. [DOI] [PubMed] [Google Scholar]
  • 7. Gliklich RE, Dreyer NA, Leavy MB. Registries for Evaluating Patient Outcomes: A User’s Guide. 3rd. ed. Rockville, MD: Agency for Healthcare Research and Quality; 2014. [PubMed] [Google Scholar]
  • 8. Gliklich RE, Dreyer NA, Leavy MB, Christian JB. 21st Century Patient Registries: Registries for Evaluating Patient Outcomes: A User’s Guide: Addendum. 3rd ed. Rockville, MD: Agency for Healthcare Research and Quality; 2018. [PubMed] [Google Scholar]
  • 9. Hershman EB, Anderson R, Bergfeld JA, et al. An analysis of specific lower extremity injury rates on grass and FieldTurf playing surfaces in National Football League Games: 2000-2009 seasons. Am J Sports Med. 2012;40:2200-2205. [DOI] [PubMed] [Google Scholar]
  • 10. Jacobs JJ, King TRW, Klippel JH, et al. Beyond the decade: strategic priorities to reduce the burden of musculoskeletal disease. J Bone Joint Surg Am. 2013;95:e1251-e1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lessley DJ, Kent RW, Funk JR, et al. Video analysis of reported concussion events in the National Football League During the 2015-2016 and 2016-2017 seasons. Am J Sports Med. 2018;46:3502-3510. [DOI] [PubMed] [Google Scholar]
  • 12. Mack CD, Hershman EB, Anderson RB, et al. Higher rates of lower extremity injury on synthetic turf compared with natural turf among National Football League athletes: epidemiologic confirmation of a biomechanical hypothesis. Am J Sports Med. 2019;47:189-196. [DOI] [PubMed] [Google Scholar]
  • 13. Matava MJ, Gortz S. The University of the National Football League: how technology, injury surveillance, and health care have improved the safety of America’s game. J Knee Surg. 2016;29:370-378. [DOI] [PubMed] [Google Scholar]
  • 14. Miksad RA, Abernethy AP. Harnessing the power of real-world evidence (RWE): a checklist to ensure regulatory-grade data quality. Clin Pharmacol Ther. 2017;103:202-205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Office of Medical Policy and the Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration. Use of Electronic Health Record Data in Clinical Investigations: Guidance for Industry. Silver Spring, MD: US Food and Drug Administration; July, 2018. https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM501068.pdf. Accessed June 16, 2019. [Google Scholar]
  • 16. Powell JW, Schootman M. A multivariate risk analysis of selected playing surfaces in the National Football League: 1980 to 1989. An epidemiologic study of knee injuries. Am J Sports Med. 1992;20:686-694. [DOI] [PubMed] [Google Scholar]
  • 17. Sherman RE, Anderson SA, Dal Pan GJ, et al. Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016;375:2293-2297. [DOI] [PubMed] [Google Scholar]

Articles from Sports Health are provided here courtesy of SAGE Publications

RESOURCES