Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Feb 22.
Published in final edited form as: Med Student Res J. 2013 May 31;2:21–29. doi: 10.15404/msrj.002.002.spring/03

A comprehensive stroke center patient registry: advantages, limitations, and lessons learned

James E Siegler 1,, Amelia K Boehme 2,3,, Adrianne M Dorsey 1, Dominique J Monlezun 1, Alex J George 1, Amir Shaban 4, H Jeremy Bockholt 5,6, Karen C Albright 2,3,7,8, Sheryl Martin-Schild 4,*
PMCID: PMC4762446  NIHMSID: NIHMS742696  PMID: 26913217

Abstract

Introduction

The use of a medical data registry allows institutions to effectively manage information for many different investigations related to the registry, as well as evaluate patient's trends over time, with the ultimate goal of recognizing trends that may improve outcomes in a particular patient population.

Methods

The purpose of this article is to illustrate our experience with a stroke patient registry at a comprehensive stroke center and highlight advantages, disadvantages, and lessons learned in the process of designing, implementing, and maintaining a stroke registry. We detail the process of stroke registry methodology, common data element (CDE) definitions, the generation of manuscripts from a registry, and the limitations.

Advantages

The largest advantage of a registry is the ability to prospectively add patients, while allowing investigators to go back and collect information retrospectively if needed. The continuous addition of new patients increases the sample size of studies from year to year, and it also allows reflection on clinical practices from previous years and the ability to investigate trends in patient management over time.

Limitations

The greatest limitation in this registry pertains to our single-entry technique where multiple sites of data entry and transfer may generate errors within the registry.

Lessons Learned

To reduce the potential for errors and maximize the accuracy and efficiency of the registry, we invest significant time in training competent registry users and project leaders. With effective training and transition of leadership positions, which are continuous and evolving processes, we have attempted to optimize our clinical research registry for knowledge gain and quality improvement at our center.

Keywords: stroke, registries, methodology, epidemiological methods, common data elements, source data verification

Introduction

Single-center registries of medical data are commonly created for clinical investigations across a variety of medical conditions, including stroke.15 Over the past 30 years, the use of registries has been demonstrated to improve the quality of care, patient prognosis, and hospitalization costs by systematically delineating standards of care by which institutions are expected to abide. This holds true for stroke patient registries68 as well as for other medical registries.911 Additionally, registries are utilized to report hospital-level data for ‘Get with the Guidelines’12 (a multicenter effort to document and improve outcomes in patients with stroke and cardiovascular disease) and for maintenance of The Joint Commission Primary Stroke Center certification.

Despite the value in medical stroke registries, there are many limitations to establishing and maintaining an up-to-date and accurate medical data registry. Some of these shortcomings include incompleteness in registry data,13 difficulties with prospective data collection during patient hospitalization,14 errors in data collection and management,15 and poor standardization in definitions among common data elements (CDEs).1518

The purpose of this article is to illustrate the advantages, limitations, and lessons learned during the creation of the registry used by the stroke program at the Tulane Medical Center, as well as how the center strives to minimize these limitations in the production and maintenance of the registry.

Methods

Patients and research personnel

The clinical registry was originally developed using a four-page case report form (CRF) to initiate data collection in preparation for the application for Primary Stroke Center certification and to address a specific study question related to the safety and efficacy of combined anti-platelet therapy during the acute phase of ischemic stroke.19 The larger registry includes all but a handful of data points requested by ‘Get with the Guidelines-Stroke’12 and all of the data points needed for reporting to The Joint Commission. The Joint Commission requires that all certified Primary Stroke Centers maintain these data on their patient population, treatment rates, and other information for quality improvement.

After the approval of the initial four-page CRF by the Tulane Medical Center Institutional Review Board in 2009, the expanded stroke registry was approved in 2011 to allow for inclusion of all patients who had a stroke diagnosis since the start of the stroke program in July 2008.

This center includes a 350-bed tertiary care center in downtown New Orleans, LA, serving a predominantly Medicare and Medicaid, African American population. See Table 1 and several recent publications for a description of the patient population.2022 The stroke service evaluates approximately 500 patients with a stroke diagnosis each year (<15% transfers from outside hospitals) and are staffed by board-certified vascular neurologists. The stroke program meets the criteria of a comprehensive stroke center, offering 24/7/365 neurosurgical and endovascular care to its patients. Data from these patients are collected prospectively as described below. The senior leadership position is held by the Stroke Director, a vascular neurology fellowship-trained academic neurologist. Two hospital employees participate in data collection for the stroke registry, but they are not funded specifically for this activity. Neurology residents and medical students are also encouraged to participate. Their duties are described in the ‘Creating a Primary Registry’ section. Despite receiving no dedicated funding, the program has expanded yearly from three students in year 1 to nearly 20 active members by year 5.

Table 1.

Patient population.

Diagnosis Year 1 Year 2 Year 3 Year 4 First 6 months of Year 5
No. ischemic stroke 185 261 309 291 174
No. treated with IV tPA (%) 27 (14.6) 69 (26.4) 75 (24.3) 100 (34.4) 78 (44.8)
No. treated with IA tPA (%) 4 (2.2) 18 (6.9) 16 (5.2) 16 (5.5) 10 (5.7)
No. TIA 62 74 79 74 33
No. intracerebral hemorrhage 38 57 60 58 34

IV tPA, intravenous tissue plasminogen activator; IA tPA, intra-arterial tissue plasminogen activator; TIA, transient ischemic attack.

Creating a primary registry

Each CDE is defined in a codebook in an effort to standardize variable definitions and to increase inter-rater reliability of data acquisition. While some CDEs are straightforward and objective (admission vital signs), other more subjective data points (pre-admission ambulatory status) achieve legitimacy through consistency with the National Institute of Neurological Disease and Stroke (NINDS) stroke-specific CDE standards.23 Despite this standardization in CDEs being released after preparing the registry, the definitions used for the registry match those used in the CDE online module. This precise labeling and classifying has allowed collaboration with other institutional stroke registries so that registry variables can be synchronized between centers and parameters adjusted between respective institutions. The aim of this is to ultimately build larger studies and corroborate findings with those of other institutions.

Consecutive patients evaluated at the center with a high clinical suspicion for stroke are prospectively added to a ledger by the stroke program coordinator. Once the diagnosis of stroke is confirmed, either clinically or via imaging, eligible patients are assigned a registry code number. Core measures and key clinical CDEs including, but not limited to, baseline demographics, stroke classification, laboratory data, and other admission information (the sum of which comprises nearly half the total number of CDEs in our registry) are collected prospectively by the stroke program coordinator onto a standardized paper version of our CRF (see Figure 1). In the days following admission, a board-certified vascular neurologist will document onto this CRF key imaging and management data.

Figure 1.

Figure 1

View of the paper and digital versions of our case report form (CRF).

A. Representative view of the paper case report form on which data are collected.

B. Screen view of the digital data collection tool (Microsoft Access 2007). Shown is a representative page in the collection tool that corresponds to the common data elements (CDEs) collected in the case report form (part A).

Key CDEs are selected for initial collection based on the ability to use responses as a filter for future studies. If an investigator establishes an ancillary project idea based on subpopulations of the registry, the key CDEs can aide in guiding the investigator to establish what additional information needs to be collected as well as how it should be collected. This is followed by applying for expedited Institutional Review Board (IRB) approval for the ancillary study, and additional needed variables can be collected from the electronic medical record and chart using a study-specific CRF (see further discussion in Supplementary Data Abstraction for details). The remaining data regarding a patient's hospitalization, complications during stay, and outcome at the time of discharge and at 3 months are collected retrospectively onto the CRF by other research team members (medical student volunteers, residents, nurse practitioners, faculty) trained in data collection. The only 90-day outcome measure collected is the modified Rankin Scale (mRS) score, a seven-point scale serving as the most commonly used functional outcome measure in neurological studies.24 Because The Joint Commission requires collection of the 90-day mRS and follow-up phone calls for disease-specific certification, our stroke program coordinator obtains the 90-day mRS by a structured and validated telephone interview, except when a patient was seen in the stroke clinic within the ± 7-day range and the mRS is documented.

Reconciliation of CDEs

Once the CDEs have been gathered onto the paper version of our CRF, potentially inaccurate data points are validated manually by a more experienced research team member. In the event that inaccurate data are suspected, the medical record would be reviewed by a more experienced member of the research team and the variable of interest would be corrected on the CRF with a time/date stamp indicating when the reconciliation was made as well as the initials of the reviewing team member. After all data on a paper CRF have been reviewed in this manner, the CRF data are then transcribed via single-entry into a secure, password-protected electronic master spreadsheet – Figure 1 – where a second reconciliation process occurs after the data are electronically transferred. Prior to analysis, each CDE used in a given research study is then sorted from smallest to largest (for continuous variables) or A to Z (for text variables) in order to identify any gross transcription errors (a letter or word in the place of a number). This process is followed by identification of any continuous numerical data that lie beyond two standard deviations for that particular CDE (classified as ‘potentially erroneous data’). These data are validated or corrected using source data verification (SDV) once a second review of that patient's medical record has occurred. After all data have been accurately collected and entered into this master electronic spreadsheet, it is then transferred to a statistical software package for analysis where the statistical files become recognized by the research team as the updated primary registry. Each of these phases in primary registry creation has been approved by the Tulane Medical Center IRB.

Supplementary data abstraction

Once the primary registry is established, a researcher can posit a study question that he/she would like to investigate. The study question is discussed with all investigators who would be involved in the data abstraction, analysis, and drafting of the manuscript, and then formed into a testable hypothesis by methodologists. The research team is then able to anticipate all quantifiable CDEs necessary to answer this question, which includes data collected in the primary registry as well as data necessitating re-review of patient medical records. The CDEs that are needed for the study question are used to create a supplemental CRF to collect the additional data. The new variables of interest are strictly defined and added to the master code book by the project PI. Once IRB approval has been granted for the proposed study, data collection with the supplemental CRF begins where it goes through the same series of SDV as described above to ensure data validity. Once these additional data have been gathered and validated in a supplemental electronic spreadsheet, they can be added to the secure master electronic spreadsheet. A summary of our data collection and interpretation methods can be found in Figure 2.

Figure 2.

Figure 2

Summary of methods.

Advantages

In an attempt to minimize some of the errors inherent to registry production and maintenance, the following three objectives were applied to the medical data registry:

  1. The same CDEs are collected accurately and completely;

  2. Each CDE has a standardized definition; and

  3. Data which can be queried for future investigations are provided.

Objective 1 ensures the abstraction of accurate, verifiable, complete, and relevant information. However, less controllable sources of data error still exist, such as errors in laboratory results and other medical data documentation from electronic medical records. The completeness of information is valuable for two reasons:

  1. All of the important facts for a given patient during their hospitalization are collected; and

  2. Each of these facts is collected across all patients in the registry, reducing bias in data abstraction.

Objective 2 provides the framework for reliable and simple information. Simple but concrete definitions, standardized within the literature, are required to study specific associations between variables and to permit collaboration with other investigators when combining variables with the same definition.

Objective 3 facilitates economical and timely information abstraction. It is important to consider the timeliness of information abstraction as this is commonly a rate-limiting step in any methodology. It may take an experienced data abstractor up to 90 min to complete one CRF and an additional 30 min to validate and transcribe these data into an electronic master spreadsheet. Not all data from a given patient can be collected in a timely manner; therefore, fundamental CDEs must be collected quickly for screening purposes and then reviewed retrospectively if any more specific questions regarding that CDE should arise. All of these key points within Objective 3 provide for flexible data that can be utilized in many different forms from reporting to ‘Get with the Guidelines’, creating reports for internal quality assurance, tracking changes within our institution, and contributing to scientific research.

These objectives are compliant with the MDR-OK categorization protocol (for mergeable data, dataset standardized, rules for data collection, observations associated over time, and knowledge of Outcomes) from a previous review that outlines effective medical data registry protocol25 and is consistent with the recommendations of the American Heart Association.26

The stroke registry serves a key function, as it provides a foundation upon which other studies can originate, as well as generate new hypotheses. Because the registry also provides a foundation for ideas to cultivate, data abstractors may notice anecdotal trends or grow curious about certain functions pertaining to strokes. This encourages a team approach to discussing novel study ideas, providing students with the opportunity to design and implement a scientific investigation, and allowing faculty members to cultivate their mentoring skills.

The largest advantage of having a registry is the ability to prospectively add patients to the registry, while allowing investigators to collect information retrospectively if needed. The continuous addition of new patients increases the sample size of our studies from year to year. Furthermore, the combination of prospective and retrospective data collection methods has been suggested as the most efficacious means for gathering data in terms of completeness and accuracy.13

Impact on quality improvement

Furthermore, the use of this registry has allowed investigations into this center's practices in order to implement internal quality improvement measures. Whenever a question regarding complications or outcomes is raised by hospital staff, the registry is queried to obtain the needed data. For example, an emergency department (ED) nurse expressed concern for treating a patient who woke up with stroke symptoms with a thrombolytic. The registry was queried after IRB approval, and we were able to report complication rates for this group of patients and compare them to complication rates of patients treated within the American Heart Association guidelines; the results were similar. While neither research objectives nor quality improvement can be identified as the primary purpose of this registry, the registry has certainly afforded our institution both types of information. In an additional example, we examined whether outcomes were compromised by prolonged length of stay in the ED.27 We found that it was not the amount of time spent by a patient in the ED, but rather the presence in the ED during the nursing shift change that was associated with increased frequency of pneumonia.27 This is one of the best examples of a research query at this center that led to a change in hospital management; however, many small changes have been implemented following research queries of the registry. While significant, these have not always resulted in publications through peer-reviewed journals.

Limitations

As in all investigations and clinical data registries, there are drawbacks to our registry. One primary pitfall is that there is no specific study in mind while collecting the information for the registry. This leaves the team at the liberty of the treating physician as to whether specific laboratory values are collected, imaging studies are ordered, and so on. Much of the information within the registry is retrospective, which can create problematic issues if aspects of patient care needed for research purposes are not included within the medical record.

While there are advantages to the checks and balances of multiple points of data entry, there is a limitation to this feature as well. The multiple points of data transfer increase the likelihood that human error can affect the data transfer and also increase the total time spent on the process, thereby decreasing efficacy.15 Because screening of data for irregularities is confined to outliers and gross typographical errors, it is possible that minor errors may go undetected if they fall within a normal distribution for a specific data point. Over half of the errors in clinical data gathering are due to data entry technique according to a recent study, but there is still a substantial portion of errors that are generated during the reconciliation process that appears to be dependent on the knowledge of research personnel.28 One unique feature of the registry is the similarity of the paper and digital versions of our data collection tool (see Figure 1). Because the two forms are nearly identical with regard to the data copied from the paper version to the electronic version, we have found that this reduces the risk of human error during transcription.

Furthermore, the use of multiple team members in the abstraction of similar data points may risk inter-abstractor reliability (meaning lack of consensus in definitions of data elements between abstractors may lead to inaccurate gathering of these data)29 and potentially lead to abstractor drift (meaning small changes in understanding CDEs by a given abstractor may result in unforeseen discrepancies in data collection). We strive to minimize this with the implementation of a very specific codebook of CDE definitions. Because the majority of our CDEs are collected prospectively by the Stroke Program Coordinator and a trained vascular neurologist, this leaves little room for potential error with our remaining data abstractors. These errors may be reduced with the implementation of a double-entry approach,30 but such a methodology may not be efficient in large patient populations with large quantities of data.31

We also implement a mandatory training period of all new research personnel whereby a more experienced supervisor (usually a senior medical student with two or more years of experience with our team) is required to monitor any new data abstractors and data entry personnel until such a time when the junior student can carry out these tasks accurately, effectively, and without further assistance. During this time, the senior team member also allocates a sufficient amount of time educating junior team members regarding general aspects of stroke pathophysiology, clinical diagnosis, laboratory and imaging studies, and management. Bi-monthly meetings with research personnel on our team also afford us the opportunity to review and discuss clinical data and their definitions in an open setting as well as an opportunity to assess the status of our new and ongoing investigations.

Another disadvantage is that this is a single center that can only offer insight into a specific population of patients who present to our institution. This limitation prohibits our ability to generalize our results to other centers and other studies. Our center is very unique in that it serves patients in the New Orleans area regardless of insurance status, and the source population of New Orleans (being in the ‘Stroke Belt’) is not a representative sample of the United States.32 This is why we have made clear, specific variable definitions so that we can combine our registry with other registries to increase sample size and improve our generalizability.

Lessons Learned

In establishing a stroke registry, we have learned many lessons regarding initiation of the registry, developing CDE definitions, and commencing projects from the registry. One factor pertains to the responsibility of the research project leader, which may be a double-edged sword. While the leadership experience gained by medical students and residents in piloting an independent study, working with a team from start to finish, and presenting results in peer-reviewed journals and at conferences is invaluable, follow-through and keeping deadlines can be challenging due to conflicting obligations. We have learned that communication of goals and interests is paramount, which fosters a true teamwork approach where students, residents, and faculty work closely together to complete projects in a timely manner. Bi-monthly meetings to communicate the status of the registry and related projects, and the dissemination of meeting minutes and a running list of projects, papers, and abstract deadlines have helped in establishing and re-establishing expectations and resource utilization.

We have also learned that investing the time to carefully train research personnel with regard to data collection techniques, variable definition classification, and data entry greatly reduces the errors in data collection. At this center, all members of the research team are required to be certified in the NIH Stroke Scale examination33 as well as undergo IRB training and certification. New members also go through a period of proper training and supervision from a more experienced team member as explained above. In an attempt to maintain data accuracy, we also limit the reconciliation of data errors to trained and experienced clinical personnel, such as upper level medical students and residents who understand the biological and statistical meaning of these data elements and can more easily recognize outliers, errors, and inconsistencies (e.g., the erroneous coding of a patient who expired when he or she was discharged to home).

We have learned that it is important to inform faculty and residents at your center about your registry. They should know which data elements are included so that they can assist in the collection of information from patients and effectively dictate these pertinent elements in their patient notes. At our center, we keep other faculty and residents informed about our registry by inviting them to our bi-monthly research meetings and actively discussing the results of our research at regularly scheduled vascular conferences, grand rounds, and other meetings. We have also created templates for admission and discharge notes, which include the most important CDEs.

The main lesson learned in this process is that data are more effectively and accurately collected when a stroke coordinator or other trained clinical personnel collect the majority of patient information prospectively, rather than retrospectively via chart review. Because of the active, prospective collection of data by this team member, with many elements collected for reporting to The Joint Commission for maintaining Primary Stroke Center certification, any uncommon data elements needed for the registry that are not intuitively gathered by residents or medical students (such as a specific history of liver disease) can be collected by the coordinator before the patient might be lost to follow-up.

It is worth disclosing that in the generation of this registry, methods and protocols have been actively evolving. The lessons learned during the early phases of registry production have already been applied to the current phase. For instance, we began the data abstraction process in 2008 with an almost entirely retrospective approach using a limited version of a CRF (approximately four pages in length with just over 350 CDEs). In January 2011, the CRF was significantly revised for a number of reasons in order to improve the efficacy and completeness of our data collection. The revised CRF now includes more data points that can be used for research queries (approximately 18 pages in length with over 1,000 CDEs) and is better organized with respect to the order of information collection. From our experience, while there are more data to be abstracted, the improvement in organization has dramatically shortened the time necessary for data collection and improves the accuracy and completeness in each of the CDEs of the registry.

Future Directions

Stroke is a leading cause of disability and death in the US population,34 and research with the use of registries has grown to be an effective way to improve the care and quality of life of individuals who suffer from this disease.6, 3537

Now that the Tulane Medical Center's primary registry has been ongoing for several years (since 2008), future directions of the registry are under discussion. Currently, in the Electronic Health Records (EHR) being used, data cannot be captured by electronic means. Instead, all information must be abstracted through manual searches with individual data point abstraction. The center is actively looking into the adoption of a new EHR system to meet the objectives of meaningful use (in improving patient health care), which may help with future data collection when ultimately implemented. The next immediate step involves improving the integrity of our data abstraction and SDV. Currently, our data entry methodology involves several points of data transfer using a single-entry technique, which has been associated with a low risk of data error. While we recognize that double-entry would reduce this rate of error, we agree with other investigations which have demonstrated that double-entry is not cost-effective due to limited time and personnel. Furthermore, we restrict ourselves to an SDV process limited to identifying outliers in our data. In the future, we can improve the accuracy of our registry by performing a random selection of non-outlier data elements for SDV. In addition, we hope to inspire other centers to develop their own stroke registries with well-defined variable definitions that are consistent with the literature and with other stroke center registries.

Acknowledgments

Funding: This project was supported by Award Numbers 5 T32 HS013852-10 from The Agency for Healthcare Research and Quality (AHRQ) and 3 P60 MD000502-08S1 from The National Institute on Minority Health and Health Disparities (NIMHD), National Institutes of Health (NIH), and 13PRE13830003 from the American Heart Association.

Footnotes

Conflict of interest: The content is solely the responsibility of the authors and does not necessarily represent the official views of the AHRQ or the NIH.

References

RESOURCES