Skip to main content
Journal of Community Hospital Internal Medicine Perspectives logoLink to Journal of Community Hospital Internal Medicine Perspectives
. 2021 Sep 20;11(5):653–657. doi: 10.1080/20009666.2021.1954282

Using macros in microsoft excel to facilitate cleaning of research data

Justin Bauzon a,, Caleb Murphy b, Sandhya Wahi-Gururaj b
PMCID: PMC8462890  PMID: 34567457

ABSTRACT

Background: Retrospective chart review studies may be delayed by inability to export clean clinical data from an electronic medical record (EMR) or data repository. Macros are pre-programmed procedures that can be used in Microsoft Excel to help streamline the process of cleaning clinical datasets.

Objectives: To demonstrate how macros may be useful for researchers at community hospitals and smaller academic health centers that lack informatics support.

Methods: Using an intrinsic function of our institution’s EMR, vital signs and lab results from 20 individual hospitalizations were exported to a spreadsheet. Two macros were developed to sort through these datasets and output them into a specified format. The speed of macro-assisted data cleaning was compared to manual transcription.

Results: Time spent on data cleaning was significantly reduced when using macro-assisted sorting compared to the manual approach for both vital signs (46.5 seconds versus 12.3 minutes per record, a 94% reduction; P < 0.001) and labs (13.7 seconds versus 2.6 minutes per record, a 91% reduction; P < 0.001).

Conclusions:Macros offer a flexible and efficient tool for cleaning large sets of clinical data, particularly when an institution lacks informatics support or EMR functionality to export clinical data in an analysis-ready format.

KEYWORDS: VBA macro, microsoft excel, retrospective chart review

1. Introduction

Retrospective chart review research depends on obtaining clinical data in an efficient manner and in a format appropriate for statistical analysis. Researchers at community hospitals and smaller academic institutions are less likely to have assistance from informatics professionals who are trained in abstracting and cleaning clinical datasets retrieved from the electronic medical record (EMR) or a central data repository. Therefore, chart review projects in these settings often rely on investigators manually abstracting data from individual patient records or having to clean large sets of raw clinical data, both of which are time-consuming processes [1].

Our team received Institutional Review Board (IRB) exemption for a chart review study that required vital sign and laboratory results from a set of prior hospitalizations. The informatics service at our community-based teaching hospital helped us identify patients meeting our inclusion criteria but was unable to provide us with the requested retrospective data. The time required to individually transcribe each vital sign and lab value from the EMR to a spreadsheet for further analysis would have caused a significant delay in our study. We identified a method of exporting the necessary data from individual patient records using a clinician-facing function of our institution’s EMR, but the resulting datasets were disorganized and the process of methodically compiling this data using standard Excel functions would have also been time-consuming.

We developed two Microsoft Excel (MS Excel) macros using Visual Basic for Applications (VBA) to facilitate the process of cleaning and compiling the vital sign and lab results exported from our EMR. Macros, short for ‘macroinstruction,’ are customizable software applications that transform inputs based on a pre-specified procedure or algorithm. They are a versatile tool in research and administrative settings for handling repetitive, tedious tasks in spreadsheet management and statistical analysis [2–5]. We were unable to identify any publications that explore the utility of macros for sorting and formatting raw research data. In this paper, we describe our VBA macros and compare the efficiency of macro-assisted data sorting versus manual data entry. The macros are included as an online supplement file along with instructions to provide a model for researchers at other institutions (Appendix A and B). These macros may be particularly useful for unfunded or trainee-led research projects, particularly at community hospitals and smaller academic centers where robust informatics support is not available.

2. Materials and methods

The primary study for which our team was seeking data required specific variables from patient medical records that included vital signs (temperature, heart rate, respiratory rate, systolic and diastolic blood pressure), CBC results (leukocyte, hemoglobin, and platelet counts) and serum electrolyte panel (SEP) components (serum sodium, potassium, bicarbonate, and creatinine). To obtain this data, we used a clinician-facing function of our EMR that allows for filtering and downloading of these variables from individual patient records as a .CSV file. Two VBA macros, ‘Sort Vital Signs’ (SVS) and ‘Sort Lab Data’ (SLD), were then developed for use in MS Excel to sort the raw clinical data into a format appropriate for further analysis. We used these macros for two purposes: (1) removing data variables that were irrelevant to our study and (2) re-formatting the raw clinical data into a format conducive to statistical analysis, as the raw data included many interspersed text boxes, blank cells, rows, and columns.

To apply the SVS and SLD macro, the raw clinical data was copied from the exported file into the ‘EMR Sorter’ spreadsheet. Activating the SVS macro within MS Excel eliminated any information irrelevant to our study and sorted the vital sign data in an orderly, chronological fashion (Figure 1). Similarly, when activated, the SLD macro removed all unnecessary laboratory data and sorted the remaining relevant data (Figure 2). Detailed instructions on using the SVS and SLD macros are included in Appendix A (‘EMR Sorter Instructions’). These macros are included in Appendix B (‘EMR Sorter.xlsm’).

Figure 1.

Figure 1.

Comparison of exported vital signs before and after sorting

Figure 2.

Figure 2.

Comparison of lab results before and after sorting

2.1. Performance evaluation of the VBA macros

Time required to sort the relevant data (selected vital signs and lab results) from a sample of 20 patient charts into a spreadsheet using the VBA macro approach was compared to manual data entry using an unpaired t-test. Accuracy of data input between approaches was determined by comparing final data sets for consistency. Appropriate data security practices were followed throughout. Dates and times within figures were altered to maintain HIPAA compliance.

3. Results

The 20 sample hospitalizations included 1276 separate clinical data points, including 1059 vital signs, 108 CBC results, and 109 SEP component values. On average, there were 53 vital signs and 11 lab records (CBC and SEP) per record.

Using the manual entry approach, the vital sign data was entered into a spreadsheet in the desired format in 246.7 minutes, for an average of 738 seconds (or 12.3 minutes) per patient record (Figure 3). With the SVS macro, the vital signs data was sorted into the desired format in 15.5 minutes, for an average of 46.5 seconds per patient record (P < 0.001).

Figure 3.

Figure 3.

Time for vital signs entry

Manual transcription of the 20 patients’ lab results totaled 51.3 minutes, with a mean result of 154.1 seconds (or 2.6 minutes) per patient record (Figure 4). Using the SLD macro in MS Excel, lab data was transferred into the desired format in 4.6 minutes, averaging 13.7 seconds for each patient record (P < 0.001).

Figure 4.

Figure 4.

Time for lab results entry

Comparison of macro-assisted versus manually sorted results showed identical datasets, suggesting excellent accuracy in the overall transfer of data using macros.

4. Discussion

Electronic medical records and their use in hospital systems as repositories for clinical data collection can facilitate retrospective chart review-based research [6]. However, smaller academic centers and community hospitals may lack the necessary informatics support or software tools to export data from EMRs in a format conducive to analysis [7]. Furthermore, delays in accessing clinical data due to hospital constraints and underinvestment in informatic resources may lead to increased study costs [8].

Macros offer a flexible and perhaps underutilized tool for assisting researchers in obtaining clean, analyzable datasets. Our study demonstrates the efficiency of using macros as a tool for cleaning research data that may not be formatted correctly for analysis. Use of macros contributed to a 90% reduction in data acquisition and sorting time compared to manually entering data into a spreadsheet. Macros follow prewritten algorithms, virtually eliminating the possibility of erroneous data entry by humans if coded correctly. Larger datasets and varied levels of personnel training may be prone to greater error rates [9].

Our study demonstrates that creative approaches exist for researchers to overcome barriers in obtaining retrospective clinical data when traditional avenues, such as being provided analysis-ready data by an informatics service, are not available. Furthermore, the VBA macros described in this paper (and shared online) can be modified and repurposed for organizing a multitude of clinical data categories and formats. Access to MS Excel is available at most institutions, which allows for the facilitation of macros as a cost-friendly alternative to purchasing other commercial software for data analysis. MS Excel on its own presents its own challenges in medical research, many of which can be addressed with macro programming [10]. This is particularly relevant for medical trainees who are tasked with reviewing medical records for clinical data acquisition or organizing datasets.

There are some limitations to a macro-assisted approach to data cleaning. First, it is most useful if data can be obtained in bulk. In our study, vital sign and lab record data were exportable on a patient-by-patient basis from the Epic EMR, which allowed approximately 50 vital signs and 20 lab components to be sorted at a time. Second, the efficiency of macros may change based on the technical capabilities of the computer processors available to the user. Third, although VBA macros are versatile enough to be repurposed for other data formatting requirements, modification requires some basic understanding of programming. Completion of both macros for the purposes of this study took approximately a week, but this may vary widely depending on skill level. Finally, this approach requires careful adherence to HIPAA-compliant data security practices, although this is true for any retrospective chart review study.

Appendices.

Appendix A: EMR_Sorter_Instructions.docx

Appendix B: EMR_Sorter.xlsm

EMR Sorter Instructions

Enabling Macros

  1. Open the ‘EMR Sorter.xlsm’ workbook.

  2. A yellow ‘Security Warning’ bar will appear when you first open a workbook containing macros. Select ‘Enable Content’ → Select Enable Macros from the window that pops up.
    1. Note: only enable macros from a secure or trusted source.

Sort Vital Signs (SVS) Macro

  1. Export the desired vital sign data into a spreadsheet format (.XLS, .CVS).
    1. See ‘Sample Vitals’ tab for example of exported vital sign data.
  2. Select all of the vital sign data in the spreadsheet → Copy the data.

  3. In the ‘Raw Data’ tab in the workbook, paste the data into cell A1 (by default, the cell will contain the text ‘COPY/PASTEHERE’).

  4. Click the ‘Sort Vital Signs (SVS Macro)’ button on the spreadsheet.

  5. Enter a patient identifier in the popup window → select OK. The macro will start immediately.
    1. Depending on how much data is present, this may take some time.
  6. The data will be exported in the desired format onto the ‘Raw Data’ tab once the macro has completed.

Sort Lab Data (SLD) Macro

  1. Export the desired lab data into a spreadsheet format (.XLS, .CVS).
    1. See ‘Sample Labs’ tab for example of exported vital sign data.
  2. Select all of the lab data in the spreadsheet → Copy the data.

  3. In the ‘Raw Data’ tab in the workbook, paste the data into cell A1 (by default, the cell will contain the text ‘COPY/PASTEHERE’).

  4. Click the ‘Sort Lab Data (SLD Macro)’ button on the spreadsheet.

  5. Enter a patient identifier in the popup window → select OK. The macro will start immediately.
    1. Depending on how much data is present, this may take some time.
  6. The data will be exported in the desired format onto the ‘Raw Data’ tab once the macro has completed.

**Please do not hesitate to contact the authors for any clarifications.

Author’s contributions

JB developed the macro programming used in the study, performed the manual entry portion of the analysis, and was a major contributor in writing the manuscript. CM organized the study and analyzed the macro-processed data, as well as contributed to the manuscript. SW oversaw completion of the study and contributed to the manuscript. All authors read and approved the final manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • [1].Sarkar S, Seshadri D.. Conducting record review studies in clinical practice. J Clin Diagn Res. 2014;8(9):JG01–JG4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Deochand N, Costello MS, Fuqua RW. Recording an excel® macro to specify date ranges for clinical data. Behav Anal Pract. 2016;9(3):230–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Talay-Deǧirmenci I, Holmes CJ, Kuo PC, et al. An analytic decision support tool for resident allocation. J Surg Educ. 2013;70(1):31–35. [DOI] [PubMed] [Google Scholar]
  • [4].Aliane N. Data acquisition and real-time control using spreadsheets: interfacing excel with external hardware. ISA Trans. 2010;49(3):264–269. [DOI] [PubMed] [Google Scholar]
  • [5].Khan HASCEW, Microsoft A. Excel add-in for easy creation of survival curves. Comput Methods Programs Biomed. 2006;83(1):12–17. [DOI] [PubMed] [Google Scholar]
  • [6].De Moor G, Sundgren M, Karla D, et al. Using electronic health records for clinical research: the case of the EHR4CR project. J Biomed Inform. 2015;53:162–173. [DOI] [PubMed] [Google Scholar]
  • [7].Takenouchi K, Yuasa K, Shioya M, et al. Development of a new seamless data stream from EMR to EDC system using SS-MIX2 standards applied for observational research in diabetes mellitus. Learn Health Syst. 1028;3(1):e10072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Barick U, Vijaykanth A, Bharucha H, et al. are retrospective patient chart audits an affordable and rliable answer to healthcare data needs? Assessing the ground reality. Biomed J Sci Tech Res. 2018;7(2). DOI: 10.26717/BJSTR2018.07.001476 [DOI] [Google Scholar]
  • [9].Worster A, Haines T. Advanced statistics: understanding Medical Record Review (MRR) studies. Acad Emerg Med. 2004;11(2):187–192. [PubMed] [Google Scholar]
  • [10].Tanavalee C, Luksanapruksa P, Singhatanadgige W. Limitations of using microsoft excel version 2016 (MS Excel 2016) for statistical analysis for medical research. Clin Spine Surg. 2016Jun;29(5):203–204. . [DOI] [PubMed] [Google Scholar]

Articles from Journal of Community Hospital Internal Medicine Perspectives are provided here courtesy of Greater Baltimore Medical Center

RESOURCES