INTRODUCTION
Good stewardship of research data begins with a comprehensive data management plan to facilitate the organization and access of information for all end-users.1 Data accessibility for the end-user is not a “one size fits all” between and among end-user researchers. Most research teams are comprised of a diverse membership, including senior and junior scientists, clinicians, and students from various disciplines, all with varying comfort levels accessing and manipulating data. Often, accessing data requires most study team members to employ a statistician and/or programmer intermediary, which can slow the process of information retrieval.2,3 Also, traditional approaches to data management have been plagued with poorly organized or lost data.4–6 On January 25, 2023, the United States National Institutes of Health (NIH) will require all research proposals to include data sharing and management plans outlining how data will be organized and what happens to the data when the funding ends.7 Today, healthcare research data management remains underdeveloped.8–10 As a result, healthcare research data have not been used to their full potential.11
The ongoing clinical trial, Turn Everyone and Move for Ulcer Prevention (TEAM-UP),12,13 included collecting data from a variety of new disparate data sources representing primary data from wearable patient accelerometer sensors and secondary data from electronic health records (EHRs) and the federally mandated Minimum Data Set (MDS) for each nursing home (NH) resident. The volume (N=816 data files), complexity, and diversity of these data presented multiple integration and compatibility challenges requiring improved healthcare research data management and oversight.
The team’s goals were to develop a solution to overcome these challenges and apply preparatory work and data structuring to assist end-user’s ability to efficiently access, merge, and manipulate the trial’s multiple data sources. A relational database and a readily-accessible dashboard interface were developed to consistently provide a wide variety of end-users with high-quality and easily accessible data, thus, enabling sustained data access, broader and deeper analyses, and secondary data analysis even after the funding ends and is therefore consistent with the forthcoming NIH data sharing guidelines.7,14
Purpose
This manuscript provides an examplar for healthcare research data management by describing the development of a relational database and customized dashboard/query instrument to democratize access and use of the large, diverse, and complex data collected for the TEAM-UP clinical trial.
Background
The primary aim of the TEAM-UP trial was to compare 28-day pressure injury (PrI) incidence (formerly referred to as pressure ulcers) among NH residents repositioned at 2, 3, or 4-hour intervals.12,13 A PrI is commonly defined as localized damage to the skin or underlying tissue due to pressure alone or pressure in combination with shear,15 and develops in approximately 15%−25% of NH residents.16–17 Prevention efforts typically include every two-hour repositioning (turning to redistribute pressure on body tissue in various locations), yet the optimal repositioning interval is unknown. The TEAM-UP trial assigned 9 NHs to one of three NH-wide repositioning interval arms (2, 3, or 4-hour). Residents’ tri-axial accelerometer sensors objectively measured resident movement and compliance with the assigned repositioning intervals for a 4-week intervention period. Repositioning due times were displayed on conveniently placed monitors to cue staff.
The primary collection of sensor data used to capture movement-related variables (position, time) and compliance with turning protocols for various NH resident populations was time-consuming and expensive. The team aimed to enhance the research value of these data by implementing a relational database containing sensor data along with EHR and MDS data for the sample residents in the Intervention. In addition, EHR and MDS data were retrieved for the 12-month pre-Intervention period, and a customized dashboard was created. An exemplar research question illustrates the execution of the dashboard and the end-user experience.
METHODS
The overarching goal to facilitate extraction of knowledge from data to answer individual research questions followed a process of well-defined stages. It began with the design of data reorganization into a relational database. The process included converting the raw data to a tabular format based on an entity-relationship diagram (ERD) and loading these tables into an Oracle (Oracle Inc., Austin, TX, USA) database.18 A dashboard/query instrument was added to facilitate specific user requests for a subset of data or for statistics about the data using a graphical user interface (GUI) to formulate customized queries and retrieve a specified dataset for subsequent analysis. Dashboard development requires advanced programming to translate an end-user’s request based on Microsoft Excel (Microsoft Inc.,Redmond, WA, USA)19 (e.g., variable names and data constraints) into the desired dataset output via a process that is literally transparent to the end-user.
Relational Database Design and Implementation
The goal of developing an ERD was to combine the 816 separate data files into subject areas and divide the information into a tabular structure to reduce redundancy, ensure accuracy, and protect data integrity. It is essential to first have an understanding of the strengths and limitations of the data, which will be the foundation of research analyses. Our primary and secondary data from diverse sources contained different types of information with varying degrees of reliability and included different intersecting populations. The ERD provided a visual structural starting point for database design, defining relationships between these different data components or entities. The following core considerations were identified to guide ERD development. Tables should contain similar subject areas, including a unique table identifier (in this case, the study identifier number (Study ID)), and be devoid of duplicate entries. It was essential to comprehend each variable’s data structure (e.g., date and time) and the amount of data (e.g., single versus multiple observations). Validity of the data depended on completeness and verification of “legal” values. Editing ensured data were placed into tables according to specified requirements that limited the appropriate numeric or non-numeric values; for example, body mass index is numeric but cannot be negative. Once the ERD was designed, the created tables were subsequently loaded into an Oracle database.18 Detailed operational definitions for the variables in each table are also available in a working document entitled ‘data descriptor,’ which is stored within the database and available for download. The data descriptor file is maintained by a study investigator and is updated whenever new data are added to the database
The specialized skillset required to develop the relational database and dashboard included expertise in data modeling, understanding of relational database and data analysis as well as working knowledge of numerous hardware and software programs. The hardware and software combination that supported the dashboard development included Oracle version (19c)18 and SAS 9.4 (TS1M7) (SAS Inc., Cary, NC, USA)20, along with the SAS/ACCESS add-on and Excel Visual Basic for Applications (VBA). SAS/ACCESS provides enterprise data access and integration between SAS and the Oracle database. The Oracle database was housed within a secure University platform.
Dashboard Development
A customized dashboard was designed to enable the end-user to execute an Excel-based query and export the generated Excel output dataset, which can be analyzed within Excel or uploaded into other statistical programs for more complex analysis.19 Graphical user interfaces for both input and output were written in Excel Visual Basic for Applications (VBA) and included three conceptual steps that can be implemented serially or in parallel; breaking the methodology into independent, analytic functional modules allowed for comprehensive customization of the dashboard.
The dashboard design is shown in Figure 1. First, the end-user completes the input Excel spreadsheet using the GUI. Data entered into the input Excel spreadsheet represents a specific user-defined request for a subset of data posed to the database system and includes variable names and any data constraints. Next, multiple user-defined combinations of data are compiled and converted into executable SQL statements behind the scenes using the predefined script. Excel VBA was chosen for the predefined script as it allows the entire dashboard workflow to be contained within Excel, except for the SQL generator engine that is written in SAS. The Model Base is the dynamic SQL generator that leverages the robust SAS macro language.20 A good practice when writing SQL is to define a base table; in this case, the base table was the Study Participants table representing a unique resident in each NH. Having a clearly defined base table reduced the programmatic complexity of dynamic SQL creation as table joins were well-defined. The clearly defined base table also diminished the potential for user error, as table joins were always one-to-one or one-to-many with respect to the resident (the unit of analysis). The final step in the dashboard development process produces output in the form of an Excel CSV file, which the GUI application inserts into the end-user’s requested file directory.
Figure 1:
Dashboard Design
Privacy and Security
The TEAM-UP database includes protected health information and is therefore subject to Institutional Review Board approval. Access requires a password, and all transactions, including reads, inserts, updates, edits, and exports, are controlled and logged.
RESULTS
This project’s ERD is depicted in Figure 2. Study Participants serves as the “base table” from which the end-user defines the study sample based on relevant participant characteristics that address the query of interest to the end-user. The remaining boxes represent specific subject areas, such as laboratory results, from which desired data can be selected and included in the query’s output dataset to be analyzed. The cardinality—or connection between the entities—is represented with different arrows. For example, a one-to-many relationship exists between study ID and laboratory results since one resident may have many distinct laboratory values. Analysis of the following exemplar research question will be used to illustrate the overall dashboard experience: How do movement patterns of NH residents at high risk for PrI differ among those residents diagnosed with obesity or dementia who did and did not develop a PrI?
Figure 2:
Entity Relationship Diagram
The end-user dashboard experience is shown in Figures 3 and 4 based on the exemplar question. Figure 3 (Exemplar Input Excel) shows the input Excel for the exemplar question. The end-user specified the input GUI with the table and variable names representing a specific request for a subset of data to answer the exemplar question. The input variables included dementia and obesity diagnoses, PrI status, Braden Scale scores, and movement data obtained from the triaxial sensor (lying time, upright time, lying frequency, and upright frequency). Four Oracle tables contained a superset of the requested data. Two constraints (time=Intervention and Braden mean score 10–12 (high risk)) complete the specification of all input requirements. Behind the scenes, the Excel VBA code compiled the end-user’s inputs and the model base executed the SQL statements. Finally, the end-user received back the Excel CSV output containing requested data for all of the residents satisfying the input criteria (Figure 4, Exemplar Question Dashboard Query Results).
Figure 3:
Exemplar Input Excel
Figure 4:
Exemplar Question Dashboard Query Results
Results from the exemplar question statistical analysis showed that among the 34 residents with high-risk Braden Scale Scores, 2 (6%) had a history of PrI, 29 (85%) had a dementia diagnosis, and 11 (32%) were obese. There were no significant differences in movement patterns in residents who did or did not develop a PrI. The results from the exemplar question were identical to those that would be generated using the traditional process of identifying and extracting the relevant data from each table and then manually joining tables using SQL scripts. However, the conventional data extraction process requires an intimate understanding of the underlying data structure and SQL, whereas generating the results from the dashboard is a much faster process that does not require the end-user to know the underlying data structure or use the SQL programming language. The consistency between manual and dashboard data extraction shows the dashboard is an efficient and effective way for end-users to access and query the TEAM-UP database.
DISCUSSION
Effective data management is critical to facilitate knowledge discovery and to address the increasing research data requirements for maintenance, retention, and dissemination.14,21 The preservation of accurately stored and retrievable data is intended to expedite expert translation of results into evidence needed to support decision-making based on these data.22 Maintenance of, access to, and use of original research data require many technical skills, such as file management, database software use, and practical understanding of data structuring.23,24 This manuscript describes how our team developed and used a relational database augmented by a customized dashboard to organize and facilitate end-user access for in-depth or exploratory analyses on an ad-hoc basis during the TEAM-UP clinical trial. The exemplar research question demonstrated how clinicians and other research team members could easily extract complex data without acquiring specialized technical skills to execute syntax queries. Using the customized dashboard to access data for analysis reduced the time and potential expense of extracting data, thus enhancing end-user productivity.
The customized dashboard included with the TEAM-UP relational database enabled users unfamiliar with SQL (e.g., clinicians and other non-technical users) access without requiring a statistician or programmer intermediary. The dashboard assisted end-users to query multiple data sources (such as EHR data and sensor movement data) and the flexibility to combine them based on a research question, thus allowing the team more time to use the resulting evidence to initiate clinical practice improvements.
The foresight and initiative of the research team led to development of this relational database and customized dashboard that are consistent with the NIH data management and sharing guidelines slated to become effective in January 2023.14 These recently promulgated guidelines aim to maximize the value of research data by requiring researchers to plan prospectively for protecting, organizing, and maintaining scientific data to ensure the accessibility and quality of data produced as an outcome of NIH funding.7 Our team’s work dovetails with the 2023 NIH data management and sharing requirements by preserving TEAM-UP trial data in a transparent, end-user accessible format, thus ensuring that the data are reusable even after funding ends.
The next phase of research for this project could implement several additional features to enhance user accessibility and productivity. For example, custom data visualizations, near real-time updating of the database, and potentially a cloud-based platform are all possible extensions. An optimal approach might combine data from every source in real-time, ensuring all events are captured and potentially available as they occur. For this project, however, organizing our primary and secondary data into a database accessible to a wide variety of end-users without the requirements of technical expertise has been a successful first step in ensuring the research value of these data is available beyond the initial project time frame.
Limitations
Financial costs of hardware, software, and personnel associated with building and housing a relational database and dashboard are significant and need to be considered as part of initial research funding requests to support data management and sharing expectations that require ongoing updates associated with the management of database systems.
Conclusions
The TEAM-UP relational database and customized dashboard safeguards the clinical trial’s data in a systematically organized, end-user accessible format consistent with the NIH data sharing and management guidelines that will take effect in 2023 and may serve as an example for future research teams.14 The benefits of using research data to the fullest potential far outweigh concerns about the cost because the data are now organized in a platform that can be accessed and explored by an increasing number of end-users for in-depth and secondary analysis, thus maximizing the value of the data for the current study and future exploration.
KEY POINTS.
Good stewardship of research data includes a comprehensive data management plan to facilitate the organization and access of information for all end-users. Yet, healthcare research data management remains underdeveloped, and reports of poorly organized or lost data are common.
Effective data management is critical to facilitate knowledge discovery and address the increasing research data requirements for maintenance, retention, and dissemination.
This manuscript provides an examplar for healthcare research data management by applying software technologies to data preparation and analytics with development of a relational database and customized dashboard/query instrument to democratize access and use of the large, diverse, and complex data collected for the TEAM-UP clinical trial.
Acknowledgments
Research reported in this publication was supported by the National Institute of Nursing Research of the National Institutes of Health under award number NIH R01NR016001.
Footnotes
ClinicalTrials.gov Registration: NCT02996331
Contributor Information
Jenny Alderden, Boise State University, Boise, Idaho, United States.
Phoebe D Sharkey, Sellinger School of Business, Loyola University Maryland, Baltimore, Maryland, United States.
Susan M. Kennerly, East Carolina University, Greenville, North Carolina.
Sanjay Ghosh, Duke University, Durham, North Carolina.
Ryan S Barrett, VP Credit Risk and Data Science, Acima, Draper, Utah.
Susan D Horn, School of Medicine, University of Utah, Salt Lake City, Utah.
Sayoni Ghosh, University of North Carolina, Charlotte, Charlotte, North Carolina.
Tracey L. Yap, Duke University, Durham, North Carolina.
REFERENCES
- 1.Briney KA, Coates H, & Goben A. Foundational Practices of Research Data Management. Research Ideas and Outcomes. July 2020;6;e56508. doi: 10.3897/rio.6.e56508 [DOI] [Google Scholar]
- 2.Little MM, St Hill CA, Ware KB, et al. Team science as interprofessional collaborative research practice: a systematic review of the science of team science literature. J Investig Med. 2017;65(1):15–22. doi: 10.1136/jim-2016-000216 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Akers KG, Doty J. Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation. 2013;8(2):5–26. doi: 10.2218/ijdc.v8i2.263 [DOI] [Google Scholar]
- 4.Jansen P, van den Berg L, van Overveld P, et al. Research Data Stewardship for Healthcare Professionals. December 22, 2018. In: Kubben P, Dumontier M, Dekker A, editors. Fundamentals of Clinical Data Science. Cham (CH): Springer; 2019:Chapter 4. [PubMed] [Google Scholar]
- 5.Ismail L, Materwala H, Karduck AP, et al. Requirements of Health Data Management Systems for Biomedical Care and Research: Scoping Review. J Med Internet Res. 2020. July 7;22(7):e17508. doi: 10.2196/17508 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kim Y, Yoon A. Scientists’ Data Reuse Behaviors: A Multilevel Analysis. Journal of the Association for Information Science and Technology. 2017;68(12):2709–2719. 0.3886/E100404V1 [Google Scholar]
- 7.Collins FS. Statement on Final NIH Policy for Data Management and Sharing. National Institutes of Health [Internet]. 2020. Oct 29 [cited 2022 Apr 16]; Available from: https://www.nih.gov/about-nih/who-we-are/nih-director/statements/statement-final-nih-policy-data-management-sharing. [Google Scholar]
- 8.Abadi D, Agrawal R, Ailamaki A, et al. The Beckman report on database research. SIGMOD Rec 2014;43:61–70. [Google Scholar]
- 9.Taichman DB, Backus J, Baethge C, et al. Sharing clinical trial data–a proposal from the International Committee of Medical Journal Editors. N Engl J Med 2016;374:384–6. [DOI] [PubMed] [Google Scholar]
- 10.Sheehan J, Hirschfeld S, Foster E, et al. Improving the value of clinical research through the use of common data elements. Clin Trials 2016;13:671–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Morris ZS, Wooding S, Grant J. The answer is 17 years, what is the question: understanding time lags in translational research. J R Soc Med. 2011;104(12):510–520. doi: 10.1258/jrsm.2011.110180 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kennerly SM, Sharkey PD, Horn SD, et al. Characteristics of Nursing Home Resident Movement Patterns: Results from the TEAM-UP Trial. Adv Skin Wound Care. 2022;35(5):271–280. doi: 10.1097/01.ASW.0000822696.67886.67 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yap TL, Kennerly SM, Horn SD, et al. TEAM-UP for quality: a cluster randomized controlled trial protocol focused on preventing pressure ulcers through repositioning frequency and precipitating factors. BMC Geriatr. 2018;18(1):54. doi: 10.1186/s12877-018-0744-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.National Institutes of Health [Internet]. Final NIH Policy for Data Management and Sharing. 2020. Oct 29 [cited 2022 Apr 16]. Available from: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html [Google Scholar]
- 15.European Pressure Ulcer Advisory Panel, National Pressure Injury Advisory Panel, and Pan Pacific Pressure Injury Alliance. Prevention and treatment of pressure ulcers/injuries: clinical practice guideline. The International Guideline. Haesler Emily (Ed.). EPUAP/NPIAP/PPPIA: 2019. [Google Scholar]
- 16.Cai S, Mukamel DB, Temkin-Greener H. Pressure ulcer prevalence among black and white nursing home residents in New York state: evidence of racial disparity?. Med Care. 2010;48(3):233–239. doi: 10.1097/MLR.0b013e3181ca2810 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Keelaghan E, Margolis D, Zhan M, Baumgarten M. Prevalence of pressure ulcers on hospital admission among nursing home residents transferred to the hospital. Wound Repair Regen. 2008;16(3):331–336. doi: 10.1111/j.1524-475X.2008.00373.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Database Services [Computer software]. Version9.4 (TS1M5). Oracle: ®. 2021. [Google Scholar]
- 19.Microsoft Corporation. Microsoft Excel [Computer Software]. Available from: https://office.microsoft.com/excel [Google Scholar]
- 20.SAS Institute Inc. 2016. SAS® 9.4 Formats and Informats: Reference. Cary, NC: SAS Institute Inc. [Google Scholar]
- 21.Wilkinson M, Dumontier M, Aalbersberg I, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Abouelenein S, Williams T, Baldner J, Zozus MN. Analysis of Professional Competencies for the Clinical Research Data Management Profession. Stud Health Technol Inform. 2020;270:1199–1200. doi: 10.3233/SHTI200361 [DOI] [PubMed] [Google Scholar]
- 23.Raszewski R, Goben AH, Bergren MD, et al. A survey of current practices in data management education in nursing doctoral programs. J Prof Nurs. 2021;37(1):155–162. doi: 10.1016/j.profnurs.2020.06.003 [DOI] [PubMed] [Google Scholar]
- 24.Dilling TJ. Artificial Intelligence Research: The Utility and Design of a Relational Database System. Adv Radiat Oncol. 2020;5(6):1280–1285. Published 2020 Jul 13. doi: 10.1016/j.adro.2020.06.027 [DOI] [PMC free article] [PubMed] [Google Scholar]