Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2011 Sep 23;19(e1):e68–e75. doi: 10.1136/amiajnl-2011-000115

The design and implementation of an open-source, data-driven cohort recruitment system: the Duke Integrated Subject Cohort and Enrollment Research Network (DISCERN)

Jeffrey M Ferranti 1,2, William Gilbert 2, Jonathan McCall 3, Howard Shang 2, Tanya Barros 2, Monica M Horvath 2,
PMCID: PMC3392865  PMID: 21946237

Abstract

Objective

Failure to reach research subject recruitment goals is a significant impediment to the success of many clinical trials. Implementation of health-information technology has allowed retrospective analysis of data for cohort identification and recruitment, but few institutions have also leveraged real-time streams to support such activities.

Design

Duke Medicine has deployed a hybrid solution, The Duke Integrated Subject Cohort and Enrollment Research Network (DISCERN), that combines both retrospective warehouse data and clinical events contained in prospective Health Level 7 (HL7) messages to immediately alert study personnel of potential recruits as they become eligible.

Results

DISCERN analyzes more than 500 000 messages daily in service of 12 projects. Users may receive results via email, text pages, or on-demand reports. Preliminary results suggest DISCERN's unique ability to reason over both retrospective and real-time data increases study enrollment rates while reducing the time required to complete recruitment-related tasks. The authors have introduced a preconfigured DISCERN function as a self-service feature for users.

Limitations

The DISCERN framework is adoptable primarily by organizations using both HL7 message streams and a data warehouse. More efficient recruitment may exacerbate competition for research subjects, and investigators uncomfortable with new technology may find themselves at a competitive disadvantage in recruitment.

Conclusion

DISCERN's hybrid framework for identifying real-time clinical events housed in HL7 messages complements the traditional approach of using retrospective warehoused data. DISCERN is helpful in instances when the required clinical data may not be loaded into the warehouse and thus must be captured contemporaneously during patient care. Use of an open-source tool supports generalizability to other institutions at minimal cost.

Keywords: Patient recruitment, research subject selection, medical informatics applications, information systems, health information technology for economic and clinical health act, health level 7, clinical trials, informatics, health IT

Introduction

The capacity to identify, evaluate, and recruit potential research participants is crucial to the success of clinical and health services research. Many institutions, however, have difficulty finding potential subjects1 2 and thus may fail to meet accrual goals or deadlines. A continuing emphasis on harnessing clinical and administrative data sources for multiple uses, coupled with increasingly sophisticated health information technology (HIT) and informatics capabilities, may substantially improve the efficacy and cost-effectiveness of clinical trials recruitment efforts.3–6

Our institution, Duke Medicine, comprises two community hospitals, an academic facility (Duke University Hospital (DUH)), and more than 130 affiliated outpatient clinics. We report on the design and initial implementation of the Duke Integrated Subject Cohort and Enrollment Research Network (DISCERN), which combines data from the enterprise data warehouse and prospective Health Level 7 (HL7) messages to enable real-time alerting of potential enrollees for clinical, quality improvement, and research staff. We discuss several use cases, describe a preconfigured, self-service DISCERN task through our researcher portal, and compare DISCERN to existing real-time recruitment methods.

Background

Well-conducted, adequately powered clinical trials are essential to evidence-based medicine and quality-improvement activities. The success of such trials, however, depends largely on the capacity to identify, recruit, and enrol sufficient numbers of appropriate subjects within a highly constrained time frame.7 Recruitment efforts represent a substantial proportion of study resources and may consume as much as 50% of the total cycle time for a clinical trial.8 Findings from studies by Haidich and Ioannidis9 suggest that clinical trials are more likely to reach accrual targets when they achieve high rates of enrollment within their first 2 months. Conversely, failing to reach recruitment goals is implicated in the lack of success of numerous studies10 and imposes significant burdens on patients and healthcare consumers by delaying the translation of new therapies from the laboratory to the bedside.11

In some cases, recruitment targets are missed owing to inaccurate estimates of the pool of eligible participants.12–14 Many health systems possess extensive and rapidly growing data repositories that, when appropriately integrated and made accessible to clinical personnel, provide rich sources of information on demographics, clinical visits, diagnoses, procedures, and clinical orders that may be evaluated to identify individuals of interest to researchers. Various academic research centers have therefore sought to leverage their data resources to identify potential subjects.15–17 Commercially available18–20 and ‘home-grown’ applications15–17 21–26 have both attempted to leverage electronic health records (EHRs) for cohort identification, with varying levels of success and generalizability.

At Duke Medicine, we similarly sought to improve cohort discovery by creating Duke Enterprise Data Unified Content Explorer (DEDUCE), a Web-based query tool that affords the Duke research community access to our enterprise data warehouse, the Decision Support Repository (DSR), which integrates and stores more than 12 years of clinical and billing data from across the health system.27 Researchers who have been trained and granted appropriate access can query the DSR according to study criteria and create reports summarizing the number of eligible potential subjects, diagnoses, and demographics, as well as clinical locations visited.

However, we soon realized that a tool using warehoused data would not suffice for three key scenarios. First, it is possible that the associated databases do not capture the necessary information needed to support a study. For example, at DUH, the timestamp noting a nurse's view and acknowledgment of a physician order is not loaded into the DSR—an omission that creates a barrier to study designs examining the timing associated with STAT orders.

Second, eligibility may depend upon time-sensitive criteria. Such specifications may include not just the wide variety of clinical events communicated across HIT systems (such as emergent diagnoses, urgent physician orders, lab results, transfers between care settings, or presentation to an emergency department) but also the timing of these events relative to one another.

Third, investigators who have identified a cohort may still require options for outpatient monitoring after treatment has been provided. Depending on the subject area, a data warehouse integrates information from frontline clinical systems after a latency period ranging from 1 to 30 days,28 because the context needed to place data points into the schema often must be reconciled from numerous frontline systems serving distinct clinical workflows. In many instances, data fields needed for warehouse integration are only complete within each of these source systems at patient discharge. This precludes using DSR information to identify patients in real time as their clinical condition or location changes.

Such temporal challenges are not new to healthcare organizations. For some time, hospitals, medical centers, and health systems have relied on automated alerting and decision-support systems to serve their clinical and research missions. In a seminal study, Kuperman et al described an alerting system that electronically paged physicians with critical lab results for patients simultaneously on high-risk medications.29 As the authors sought to reconcile data from distinct clinical specialties, they noted two important factors for success: (1) the individual data streams fed to the decision-support logic must share a common electronic platform, and (2) interfaces must be available between decision-making and notification systems.

Since the publication of the study by Kuperman et al, significant efforts have been devoted to using real-time or near-real-time alerting for public health30 31 and bioterrorism32 surveillance. Mandl and colleagues note that a lack of common standards for data transmission can hamper or complicate efforts to share and use data across systems, although they also note that this circumstance may improve as the adoption and uptake of standards such as HL7, DICOM, and others become increasingly ubiquitous.32

Such considerations are relevant to a real-time application for cohort identification and recruitment. As we note above, a more ‘traditional’ design that relies on querying a central data repository has critical limitations related to latencies arising from the complexity of the healthcare environment. A hybrid recruitment system configurable to query both a data warehouse as well as data regarding events as they occur during patient care would fill this gap.

Methods

Model formulation

We sought to create an extensible, easy-to-implement workflow for cohort recruitment scenarios in which eligibility depends upon time-sensitive criteria or data not in the DSR. We identified HL7 messages that are created and exchanged when clinical data systems intercommunicate in support of immediate care as the best prospective data source to capture events that may shape trial eligibility, as well as to harness data that warehousing may not collect.

Duke Medicine uses an integration broker (Sun eGATE) to route HL7 messages holding information concerning care actions, orders, patient movement, medical document management, care results, and scheduling across distinct clinical applications. The integration broker handles the processing and translation that enables these applications, whether vended or home-grown, to exchange information.

Tapping the HL7 messaging stream for data relevant to clinical trials has two advantages. First, because HL7 messages passing through the integration broker are discarded by the accepting system once that information is transformed into actionable data, the stream represents our single source for certain data types (eg, timestamps surrounding clinical actions) that may not persist in the application for eventual warehousing. Second, HL7 messages hold only small parcels of information, such as an admission or transfer event. This allows quick sorting and evaluation for context relative to data warehouse queries or other recently transmitted HL7 messages. Thus, our model, the DISCERN, deploys a persistent function capable of reading all clinical information exchanged by HL7 messages as they leave the broker, analyzes them for data potentially relevant to trial recruitment, reasons over supplemental information (eg, database query results or other incoming HL7 messages), and alerts personnel to potential recruits.

We anticipate that DISCERN will increase recruitment scenario options by permitting time-dependent filter criteria that confer contemporaneous awareness of events happening across distinct clinical settings (eg, a recent patient readmission). The DISCERN workflow described below has three distinct stages: (1) definition (determination of project specifications), (2) configuration (arrangement of the necessary retrospective and prospective queries), and (3) reporting (provision of alert).

Stage 1: Definition of the DISCERN job

Because HL7 is an interoperability standard and not readily digestible by a casual clinical user, the first stage necessarily comprises a consultation between the DISCERN developer and the researcher to define functional requirements. The developer has an understanding of data provenance and is thus best suited to determine how HL7 streams may meet the user's needs. Users whose needs can be accommodated by a self-service retrospective query of the DSR may be directed to the DEDUCE portal. To use DISCERN, all users must have an active, institutional review board (IRB)-approved protocol explicitly stating that DISCERN is part of the recruitment strategy. DISCERN developers ensure that users are granted access only to data relevant to a study's explicit eligibility criteria.

Stage 2: Configuration of the DISCERN job

To reason over HL7 messages passed through the integration broker, we selected an open-source, modular tool that uses a widely available programming language. Mirth software (http://www.mirthcorp.com/ (WebReach)) is an open-source, standards-based solution for healthcare system interoperability that receives, interprets, processes, and sends healthcare information (such as HL7 messages) over a variety of protocols to support the exchange of health information.33 Mirth is packaged with a Java-based dashboard tool (Mirth Connect Administrator) that provides a graphical user interface (GUI). Within this dashboard, users may supply JavaScript to process HL7 messages exchanged by the integration broker.

A Mirth instance comprises a series of channels configurable within Mirth Connect Administrator. A channel is an interface that consists of a connector (ie, data source specification), one or more filters (instructions to sort incoming data and accept or reject), one or more transformers (instructions to modify accepted data), and one or more destination connectors (transmission of transformed data externally or to another channel).

To create a channel, the developer uses the Mirth Connect Administrator dashboard to perform three general steps (each of which is a distinct tab within the dashboard): (1) configuration of information acceptance, (2) configuration of information transmission, and (3) application of any instructions using a short, channel-associated JavaScript (eg, apply filters; call other code). In this context, ‘information’ may include HL7 messages, SQL queries, or other formats accessible through an ODBC interface. Mirth can also load a file of ‘test’ HL7 messages to be passed through the channel architecture in order to assess the fidelity of query processing for a project's objective. Outbound information can be sent to other channels or written to a database.

A DISCERN project can use multiple channels to accomplish objectives; contrariwise, a single channel can serve multiple projects. An unlimited number of channels can be created in Mirth, allowing developers to serve scenarios wherein required data points arrive via multiple HL7 messages. Mirth also permits storage of data in temporary tables or memory so that they can be reasoned over using additional information—either by way of HL7 messages or a separate query of another database (eg, a research subject registry). Figure 1 provides an overview of how DISCERN uses Mirth to process HL7 messages.

Figure 1.

Figure 1

Health Level 7 (HL7) message pathway through the Duke Integrated Subject Cohort and Enrollment Research Network (DISCERN) framework. All diamond elements represent Mirth channels in the DISCERN system. An HL7 message is created as part of patient care at a Duke Medicine facility and transmitted over Transmission Control Protocol/Internet Protocol from the integration broker and received by a Mirth reception channel. The message is stored for processing, and another Mirth channel performs a transformation, such as to XML format in this example. The converted message is then subject to any number of channels that perform study-specific tasks, such as transformation or filtering for specific lab result values. The execution of those channels for a particular DISCERN job is defined by a JavaScript created by the DISCERN developer. If a patient is identified as suiting the filter criteria, the result is stored in the data warehouse, and then a final Mirth channel is enacted that notifies the DISCERN user of a potential study candidate according to the method desired. DSR, Decision Support Repository.

To further illustrate the flexibility of Mirth channel architecture, we present a specific case from the Duke Heart Center in which a group of physicians desires immediate notification if one of their cardiac surgery patients is readmitted to the hospital or emergency department following discharge for the original surgery. Online appendix 1 describes the technical details for configuring this scenario, which has both prospective and retrospective query steps. The first Mirth channel is configured to examine and route all inbound HL7 messages to the integration broker. In this case, when an admission-discharge-transfer (ADT) message is intercepted, this first channel recognizes the message type and acts as ‘traffic control’ to route that information to a second channel dedicated to processing ADT messages. (This channel has a variety of instructions relevant to any project where ADT data are required for a real-time query, not just the example at hand.)

This second channel accepts the incoming data and complies with JavaScript instructions bound to the channel, which in this case are to send an SQL query to the DSR asking whether the patient represented in the HL7 message is present in a relational database that contains all recently discharged Duke Medicine cardiac surgery patients. If the ADT message pertains to a patient in the database, it returns a value of ‘True.’ This ADT channel is configured so that receiving a value of ‘True’ instigates the sending of information to a third channel responsible for real-time notification.

Channel 3 is a generic channel built to create an email alert for DISCERN users. Channel 2 sends Channel 3 the name of the provider to be notified, the identifiers of the readmitted patient, and the patient's current location. An email with these data is sent to alert the physicians that a recent cardiac surgical patient has been readmitted to a Duke Medicine hospital or emergency department.

This case demonstrates the unique hybrid query scenario for a project that would require both real-time and archival data. At Duke Medicine, an approach that was restricted to a retrospective examination of the data warehouse would be capable of alerting personnel to a readmitted patient only well after the care event had occurred. In many cases, this would take place after a patient is discharged, and the billing process has begun—too late to ensure that the providers responsible for the prior surgery are made aware of the patient's readmission and treatment.

This example showcases just one portion of the channel architecture that enables numerous DISCERN projects. Channels can support multiple queries, as JavaScript details can apply sorting logic and other filters to HL7 messages to suit a given project. When a new project is brought to DISCERN developers, it is integrated into the existing channel architecture, because the developer ultimately seeks to minimize the number of active channels in order to easily manage distinct projects. The developer may also consolidate channels with a generic functionality shared by multiple projects, such as sending a text page or parsing ADT data. For any DISCERN project, the two final channels in the DISCERN workflow will (1) notify the researcher of any identified patients meeting study criteria and (2) place the information into a DISCERN database with its own subschema within the DSR for the purpose of later error-trapping if needed. The DISCERN service runs automatically without further interaction until deliberately stopped.

Stage 3: DISCERN reporting

In the final stage of the DISCERN workflow, a notification may be sent to the researcher immediately (eg, SMS message, text page, or email) or in aggregate (spreadsheet file or on-demand report written in a business intelligence program). When designing requirements in the initial consultation, the developer would advise the end user as to which data-reporting method is likely to yield the best results and not subject the user to alert fatigue. DISCERN jobs are automatically triggered to cease reporting once an IRB protocol expires.

Results

DISCERN currently reads 500 000 messages per day in service of 12 distinct projects organized into 11 Mirth channels. A summary of current DISCERN use cases is shown in table 1. While most DISCERN jobs concern cohort recruitment, others have applied similar principles to quality-improvement projects, thus demonstrating model's versatility. The reporting method correlates with the window of time available to the researcher to recruit the subject, given the study parameters. The capability of the Mirth engine to listen prospectively to real-time HL7 messages is central to the success of the DISCERN workflow. For any use case in table 1 that must report subject identification immediately (ie, by text page or email), a method that places this information into databases or data marts for later query could not have instantly notified the study coordinators of a potential recruit.

Table 1.

Summary of Duke Integrated Subject Cohort and Enrollment Research Network use cases

Type Delivery Use case Filter criteria Data points provided
Research/recruiting Email and text page Identify patients with positive blood bacteria cultures currently admitted to DUH Lab result Patient identifiers; lab result, inpatient location
Research/recruiting Text page Identify DUH newborns at risk of ischemic brain injury to collect umbilical cord blood cells immediately after birth Location; order entry for ‘total body cooling’ Patient identifiers; inpatient location
Research/recruiting Email Identify patients admitted to two DUH pediatric locations or seen in the emergency department by an attending oncologist, with the goal of detecting possible dangerous fungal infections. Location, admitting physician Patient identifiers, admitting physician, diagnosis, inpatient location
Research/recruiting On-demand reporting Identify patients with new ICD-9 diagnoses for atrial fibrillation or atrial flutter for recruitment into an anticoagulant study ICD-9 code Patient identifiers, upcoming appointments
Research/recruiting On-demand reporting Identify chronic gout sufferers with high uric acid lab results ICD-9 codes, age, lab value Patient identifiers, demographics, lab result, upcoming appointments
Research/recruiting On-demand reporting Identify adult asthma sufferers Age and ICD-9 code Patient identifiers, upcoming appointments
Research/recruiting On-demand reporting Recruit adolescent girls who have received one or two doses of the HPV vaccine series Age and ICD-9 code Patient identifiers, upcoming appointments
Research/monitoring Text page From a list of patients registered for a phase II pediatric clinical trial receiving inositol at DUH, identify instances when creatine values rise above the threshold value mandating discontinuation of investigational drug List of patients enrolled in IRB protocol; lab result Patient identifiers, inpatient location, lab result
Quality improvement Pop-up alert sent to nearest computer in nursing unit Immediately alert neonatal intensive care nurses when an urgent ‘STAT’ order is entered by a physician in the computerized provider order entry system by sending a pop-up alert to the computer closest to the affected patient Computerized provider order entry order type; patient location Patient identifiers
Quality improvement On-demand reporting Identification of high-risk pediatric patients who should be prioritized for H1N1 vaccine delivery during a vaccine shortage Age and ICD-9 code Patient identifiers; upcoming appointments
Quality improvement Email Identification of pediatric patients with an admitting diagnosis indicating asthma Age; text matching for asthma-related terms Patient identifiers; inpatient location; attending physician
Quality improvement Email Identify Duke Medicine Heart Center surgery patients recently readmitted at a Duke Medicine hospital and alert the discharging physician Patient discharged by Duke Heart Center within last 31 days Patient identifiers; inpatient location

DUH, Duke University Hospital; ICD-9, International Classification of Diseases, 9th revision.

DISCERN use cases for patient recruitment

Umbilical-cord blood collection

DISCERN was used in a study involving the prospective collection and intravenous readministration of umbilical cord blood cells, which is hypothesized to be a neuroprotective adjunct to total-body cooling in cases of neonatal hypoxic ischemic encephalopathy (HIE). The study protocol mandates that cord blood be procured and prepared immediately after birth and administered within the first days of life. Obstetricians were asked to collect cord blood at every high-risk delivery; however, given the acuity of these deliveries, there were multiple missed opportunities for recruitment, particularly among infants born late at night.

From July through December of 2009, six infants delivered at DUH underwent total body cooling, but the study team was not notified of candidates for 6–24 h. As a result, no cord blood units were procured, and no patients were enrolled in the trial. A DISCERN process was constructed that (1) looked for a nursing order of ‘total body cooling,’ (2) paged a study coordinator to start the consent process, and (3) paged the study investigator, who could contact the obstetrician to ensure cell procurement. By backtracking to deliveries of children with hypothermia orders but no cord-cell collections, we identified issues with the stocking of the sterile cell procurement kits as well as with obstetrician availability, underscoring not only DISCERN's usefulness for patient recruitment, but also its application to process and quality improvement. Once the DISCERN process was in place and the cord-blood collection process was optimized, nine patients were cooled, of whom eight were enrolled in the study from January to November of 2010.

Human papillomavirus vaccine study

DISCERN was used to enrol patients in a study investigating the timing of administration of a human papillomavirus (HPV) vaccine series; specifically, the effects on the immune system if doses are not given at the specified intervals (0, 2, and 6 months). The study requires blood to be drawn from subjects at the administration of the third vaccine dose and again 1 month later. Researchers thus needed to be able to quickly find patients residing in the appropriate window of the vaccine series.

To identify a cohort for this study, DISCERN scanned for adolescent girls who received the first or second shot in the series, as identified by Current Procedural Terminology (CPT) billing codes. Results were made accessible to researchers through on-demand reporting that showed patient identifiers and upcoming appointments. The study team then contacted the attending physician to obtain permission to approach the patient at the next appointment.

In the first 10 months of the study, subjects were recruited using traditional pathways, which resulted in 43 enrollees out of 448 patients approached (an enrollment rate of 9.6%; figure 2). Starting in April of 2010, DISCERN was used to identify potential recruits. Over 4 months, there were 62 enrollees out of 421 patients approached—an enrollment rate of 14.7%. Not only was this 53% higher (Pearson χ2 test, p=0.02), but the recruitment rate increased from an average of 4.3 subjects/month to 15.5 subjects/month. The study team also reported a reduction in eligibility screening time, from >10 h/week to 4 h/week.

Figure 2.

Figure 2

Human papillomavirus (HPV) vaccine study recruitment. The number of approached and enrolled subjects is plotted monthly for a study seeking individuals receiving the HPV vaccine series. In months following Duke Integrated Subject Cohort and Enrollment Research Network (DISCERN) deployment (April 1, 2010), the enrollment rate increased from 9.6% to 14.7% (p=0.02).

Pediatric asthma study

As part of the quality-improvement project, a group of DUH pediatricians sought to identify all pediatric asthma patients admitted to the hospital. One challenge presented by this project was that children with new diagnoses of asthma at DUH would not have an indication of their condition in the DSR. DISCERN was used to monitor multiple HL7 message types identifying patients who were (1) <20 years old, (2) admitted to a general pediatrics floor, and (3) received an admitting diagnoses of asthma as entered into the computerized provider order entry system. Because admitting diagnoses are captured only as free text by computerized provider order entry, a DISCERN channel was configured to use regular expression matching to search the text string for series of terms indicative of asthma, as well as common abbreviations and misspellings. Because a diagnosis of asthma is difficult to make in young children, the search term list was expanded to include more general terms such as ‘wheezing.’

When the triggering conditions for the channel were met, DISCERN sent an email that contained the admitting physician, matching admission diagnosis, and patient medical record number to four other pediatricians. In a 1-month pilot phase, DISCERN identified 15 patients, 13 of whom were true asthmatics included in the QI project. Given the architecture of HIT systems at DUH, this work would not have been possible using a researcher query tool such as i2b2, as the admitting diagnosis is not available for database query until the medical billing process has been completed postdischarge. Without DISCERN, physicians would have been obliged to resort to daily manual checking of EHRs for eligible patients.

Self-service DISCERN applications

Due to the need to maintain data provenance, clinicians wishing to use the DISCERN tool for a unique query must work with an analyst who configures Mirth. However, upon fulfilling several requests, it became clear that a simple and common DISCERN use case could be automated within the context of our DEDUCE researcher portal. One of the most frequent requests was for access to timely information regarding future clinic appointments for a list of patients. Within the DEDUCE portal, researchers may (in a self-service manner) either define a cohort by querying the organizational data warehouse or upload a list of medical record numbers. Within this portal, they now have access to a DISCERN ‘button’ that with one click provides a current report detailing all upcoming clinic appointments, including date, time, clinic name, provider, clinical service, visit type, and visit reason. This allows trial coordinators to meet the potential recruit's physician at the next healthcare encounter and request permission to offer a trial to the patient. In this way, DEDUCE developers are entirely removed from the logistics of defining a new DISCERN query because the underlying Mirth channels already exist. As additional reusable use cases are identified, they will be tested and automated for self-service within the DEDUCE researcher portal.

Discussion

Summary

Clinical investigations and quality-improvement research often require real-time data on patient care actions. Novel HIT applications that offer access to retrospective data in support of clinical trial research such as STRIDE,34 i2b2,35 and DEDUCE27 have been described in the literature. DISCERN builds upon the capabilities afforded by these applications by providing a hybrid framework for clinical trial recruitment that uses HL7 messages containing real-time data on clinical events to complement more traditional approaches utilizing warehoused data. The functionalities afforded by access to real-time data become particularly important when timing issues are critical to study enrollment, multiple events within distinct HIT systems must be cross-evaluated to confirm eligibility, and/or desired data are not permanently captured by the target clinical application for warehouse loading. We also posit that in addition to improving the mechanics of data capture, DISCERN's filtering capabilities can contribute to better clinical research associate workflow by expediting in-person prescreening for trial eligibility during clinic visits.

Comparison to other subject recruitment tools

Most reported cohort recruitment tools rely on retrospective information and are not able to use data collected prospectively during clinical care.36–39 Whereas warehoused data are often standardized and organized into cross-referenced tables, clinical data from operational systems are newly created, quickly communicated, and often standardized only to the extent required for intercommunication. Although warehoused data may have timely refresh rates, these are typically on the order of daily or weekly. Thus, different tools are needed to tap into each data type, particularly when the time window for recruiting a subject is narrow or study eligibility criteria are temporally constrained.

The use of HL7 messaging specifically to support clinical trial recruitment is not widespread. In a recent conference proceeding, Weber and colleagues describe two pilot studies in which the Java-based Esper engine was used to deploy custom Java classes that could read HL7 messages and filter them according to multiple recruitment criteria using a variation of SQL.40 As with Mirth, the Esper engine is open source, but the authors note that an adapter for HL7 messages had to be manually coded, suggesting that significant local configuration was required. In addition, the Esper-based system is purely real-time in nature and presently does not permit consulting with relational databases, although such hybrid capabilities are slated for future development. In contrast, Mirth is purpose-designed as a tool for healthcare interoperability and has built-in HL7-reading functionality within a GUI; in addition, its hybrid design facilitates the use of both real-time data and retrospective, warehoused information.

Most other cohort recruitment tools that rely on real-time data sources typically incorporate scripted searches executed at repeated, frequent intervals. For example, the RealTime Recruiting tool15 scans emergency department registration records every 2 min for criteria that match study requirements,17 and Cardozo and colleagues report on a program that searches emergency department registration records at 15 min intervals.22 These methods, however, are limited to the body of information contained within a single system, hindering their generalizability to other settings. In a recent paper, Nelson and colleagues report on a near–real-time alerting system that was deployed at the Mayo Clinic to monitor for instances of septic shock.41 A research data mart was queried hourly for a complex series of predefined rules and was updated with information from ICU patients' EHRs as frequently as every 15 min, thus receiving information from numerous systems. Such a system offers the benefit of being relatively simple to implement; however, the continuous query model may not scale well in larger populations. DISCERN, on the other hand, draws from a confluence of health-system-wide data streams, as opposed to a single care setting or department. Instead of performing constant queries, the Mirth engine filters HL7 messages for applicability to an existing DISCERN job.

In addition, a number of models described in the literature target only a single point of contact in a specific role (eg, a care provider or principle investigator). Data entered during patient evaluations and treatment planning are screened in real time for criteria that, if met, trigger a pop-up prompt to alert the provider of eligibility during the examination.19 23 42 However, Grundmeier and colleagues report that this method is not always easily integrated with provider workflow. They note that physicians often ignore pop-up alerts, citing lack of sufficient time both to treat the patient and explain a study for recruitment purposes.23

In contrast, DISCERN allows multiple persons performing a variety of roles to be notified following a trigger, and the mode and frequency of an alert can be tailored to the project's needs. Further, the various permutations of staff and types of notification can be used synchronously. Investigators and staff can opt to have DISCERN batch reports every few hours or send a page immediately upon encountering a trigger event. This type of customization, useful in improving recruitment for a particular study, also lends itself to a variety of investigatory avenues.

Given that real-time monitoring of clinical and laboratory messaging30–32 and real-time data warehousing43 have been proposed and implemented for biosurveillance purposes, such platforms44 could be leveraged to create notification of potential recruits as a side benefit of real-time extract-transform-load processing. In such a scenario, clinical data feeds would be loaded in real time to give researchers full access to all data points when developing a recruitment query. This ‘all or nothing’ approach would be highly problematic for many organizations, because the cost of real-time warehousing increases sharply with the amount of data incorporated and the frequency of data refresh rates.45 In contrast, the DISCERN model allows us to focus real-time data capture only where it is needed, using a free, open-source application. DISCERN thus offers a tractable and scalable form of real-time data warehousing.

Limitations and lessons learned

The DISCERN framework is easily configurable and most extensible to organizations having both an enterprise feed of HL7 messages as well as a clinical data warehouse. However, it is not a stand-alone tool but a service provided by a technical team. At our organization, the initial consultation between the DISCERN developer and the end user is critical both for ensuring success and for verifying that there is a legitimate need for real-time data that exceeds the capabilities of a self-service researcher query tool such as our DEDUCE platform or i2b2.

As the service increases in popularity, issues of scalability may present challenges, and our organization may need to hire more staff to translate and deploy customer requests. At present, however, our single DISCERN developer spends only 8 h/week in service of 12 projects, whereas the six analysts providing traditional ad hoc warehouse extract services consume 30 h weekly.

This enhanced efficiency achieved by DISCERN is due to several factors. Once a channel is built, it is easily reused for other projects and configurable in the drag-and-drop Mirth interface. Any associated JavaScript is easily modifiable to suit a new yet familiar query request. Once a DISCERN query is released, it persists without maintenance until explicitly deactivated. Finally, as described above, a preconfigured, commonly requested DISCERN task has been added for self-service within the DEDUCE portal. For reasons related to data-provenance concerns, we believe it inadvisable to give researchers unlimited access to a DISCERN-like tool; however, we expect to add more self-service tasks in the future to complement the ad hoc DISCERN service.

The DISCERN workflow is a revolutionary but disruptive technology that fundamentally changes the ground rules for subject recruitment. Because of efficiencies potentially afforded by DISCERN, its use can exacerbate competition for patients. As patients generally are not enrolled in multiple trials, negotiating the governance of DISCERN across multiple studies and clinics presents ongoing challenges, and we are presently establishing a governance structure to address these important issue. If clinicians dislike competing for subjects to enrol in their trials, they may reject the application-even if it significantly enhances recruitment.

The cultural and anthropologic issues surrounding tools such as DISCERN deserve independent study and are not meant to be addressed by the present work. However, we note that no technology, including DISCERN, can address other key obstacles to recruitment—lack of patient interest in the trial, or a primary care doctor that is uncomfortable referring the patient.

A final major limitation of this work is that it is too early to judge definitively whether DISCERN has meaningfully increased the efficiency of clinical trial recruitment across Duke Medicine. The studies using DISCERN are ongoing, and the perceived efficacy of the approach among our clinical research partners has engendered a reluctance to recruit patients without using the technology. We plan to conduct a randomized controlled trial in which one study coordinator group uses DISCERN, and another uses traditional methods, but have yet to find a clinical research team willing to reduce their DISCERN usage to allow for this comparison. In addition, deriving sensitivity and specificity statistics would require capture of false negatives (eg, real ‘hits’ missed by DISCERN) as well as true negatives (eg, the number of all potential hits correctly rejected by DISCERN). We are evaluating a number of DISCERN projects for the potential to conduct simultaneous manual chart review to collect false negatives, but only those query criteria easily identified through the EHR workflow can be evaluated using this method. Identification of true negatives, meanwhile, would require hardware sufficient to collect all HL7 messages and store them for later parsing by a scripting language.

Future work

As noted above, in cases where many different DISCERN requests follow the same pattern or recipe, developers will incorporate a DISCERN feature into the DEDUCE research portal. This integration of DISCERN into DEDUCE infrastructure allows centralized management of both retrospective and prospective data, potentially yielding economies of scale and freeing study investigators from burdensome data-management tasks. If this proves successful, we may extend DISCERN to the external research community by integrating it with the i2b2 hive structure currently in place at Duke Medicine that shares data with other National Institute of Health Clinical and Translational Science Award holders. In this way, interinstitution i2b2 cohorts could be refined using real-time information.

Eventually, we plan to add functionality that would allow targeting of relevant clinical trial opportunities to patients' health portal accounts, which provide Web access to lab results, vital measurements and medical history, upcoming appointments, online bill payment, and appointment scheduling.

Conclusions

Complementary approaches are needed to harness both retrospective and real-time data to identify potential study recruits and alert appropriate staff. The DISCERN framework serves a variety of clinical trial recruitment scenarios, including problematic variations in which study eligibility hinges on a confluence of specifications, clinical events, and temporal dependencies. In contrast to other cohort recruitment tools, the DISCERN model offers a highly flexible solution for prospective recruitment, with architecture extensible to any other organization that relies on HL7 messaging among clinical data systems. Our experience may help others seeking to deploy similar systems achieve success more quickly. More study is needed to definitively evaluate whether DISCERN goes beyond patient identification to measurably increase the speed of successful recruitment into clinical trials.

Acknowledgments

The authors thank the Duke University Health System Information Management Group, R Califf, R Goldberg, D Tanaka, J Eckstrand, M Cotton, W Steinbach, J Wynn, B Vickery and C Walters for design guidance and the sharing of DISCERN use cases. We also thank E Hammond for critical review of the manuscript.

Footnotes

Funding: This publication was made possible by Grant Number UL1 RR024128 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH. Information on NCRR is available at http://www.ncrr.nih.gov/. Information on Re-engineering the Clinical Research Enterprise can be obtained from http://nihroadmap.nih.gov/clinicalresearch/overview-translational.asp.

Competing interests: None.

Provenance and peer review: Not commissioned; externally peer reviewed.

References

  • 1.Lovato LC, Hill K, Hertert S, et al. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Control Clin Trials 1997;18:328–52 [DOI] [PubMed] [Google Scholar]
  • 2.Watson JM, Torgerson DJ. Increasing recruitment to randomised trials: a review of randomised controlled trials. BMC Med Res Methodol 2006;6:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Section A, Title XIII, of the American Recovery and Reinvestment Act of 2009(HITECH Act), 2009. frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=111_cong_bills&docid=f:h1enr.pdf (accessed 22 Dec 2010). [Google Scholar]
  • 4.Hersh WR. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care 2007;13:277–8 [PubMed] [Google Scholar]
  • 5.Institute of Medicine Committee on Quality of Health Care in America. Chapter 7: Using information technology. In: Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academy Press, 2001 [Google Scholar]
  • 6.Safran C, Bloomrosen M, Hammond EW, et al. Towards a National Framework for the Secondary use of Health Data: A Report of the Working Conference of the American Medical Informatics Association (2006). https://www.amia.org/files/workforthesecondaryuseofhealthdata_09_08_06_.pdf (accessed 22 Dec 2010).
  • 7.Gren L, Broski K, Childs J, et al. Recruitment methods employed in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Clin Trials 2009;6:52–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Marks L, Power E. Using technology to address recruitment issues in the clinical trial process. Trends Biotechnol 2002;20:105–9 [DOI] [PubMed] [Google Scholar]
  • 9.Haidich AB, Ioannidis JP. Effect of early patient enrollment on the time to completion and publication of randomized controlled trials. Am J Epidemiol 2001;154:873–80 [DOI] [PubMed] [Google Scholar]
  • 10.Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials. 3rd edn New York: Springer Scientific Publishing, 1998 [Google Scholar]
  • 11.Sung NS, Crowley WF, Genel M, et al. Central challenges facing the national clinical research enterprise. JAMA 2003;289:1278–87 [DOI] [PubMed] [Google Scholar]
  • 12.Carter RE, Sonne SC, Brady KT. Practical considerations for estimating clinical trial accrual periods: application to a multi-center effectiveness study. BMC Med Res Methodol 2005;5:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Campbell MK, Snowdon C, Francis D, et al. Recruitment to randomised trials: strategies for trial enrollment and participation study. The STEPS study. Health Technol Assess 2007;11:iii, ix–105. [DOI] [PubMed] [Google Scholar]
  • 14.Barnard KD, Dent L, Cook A. A systematic review of models to predict recruitment to multicentre clinical trials. BMC Med Res Methodol 2010;10:63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Butte AJ, Weinstein DA, Kohane IS. Enrolling patients into clinical trials faster using RealTime Recuiting. Proc AMIA Symp 2000:111–15 [PMC free article] [PubMed] [Google Scholar]
  • 16.Turchin A, Pendergrass ML, Kohane IS. DITTO—a tool for identification of patient cohorts from the text of physician notes in the electronic medical record. AMIA Annu Symp Proc 2005:744–8 [PMC free article] [PubMed] [Google Scholar]
  • 17.Weiner DL, Butte AJ, Hibberd PL, et al. Computerized recruiting for clinical trials in real time. Ann Emerg Med 2003;41:242–6 [DOI] [PubMed] [Google Scholar]
  • 18.Embi PJ, Jain A, Clark J, et al. Effect of a clinical trial alert system on physician participation in trial recruitment. Arch Intern Med 2005;165:2272–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Embi PJ, Jain A, Clark J, et al. Development of an electronic health record-based Clinical Trial Alert system to enhance recruitment at the point of care. AMIA Annu Symp Proc 2005:231–5 [PMC free article] [PubMed] [Google Scholar]
  • 20.Embi PJ, Payne PR, Kaufman SE, et al. Identifying challenges and opportunities in clinical research informatics: analysis of a facilitated discussion at the 2006 AMIA Annual Symposium. AMIA Annu Symp Proc 2007:221–5 [PMC free article] [PubMed] [Google Scholar]
  • 21.Afrin LB, Oates JC, Boyd CK, et al. Leveraging of open EMR architecture for clinical trial accrual. AMIA Annu Symp Proc 2003:16–20 [PMC free article] [PubMed] [Google Scholar]
  • 22.Cardozo E, Meurer WJ, Smith BL, et al. Utility of an automated notification system for recruitment of research subjects. Emerg Med J 2010;27:786–7 [DOI] [PubMed] [Google Scholar]
  • 23.Grundmeier RW, Swietlik M, Bell LM. Research subject enrollment by primary care pediatricians using an electronic health record. AMIA Annu Symp Proc 2007:289–93 [PMC free article] [PubMed] [Google Scholar]
  • 24.Kamal J, Pasuparthi K, Rogers P, et al. Using an information warehouse to screen patients for clinical trials: a prototype. AMIA Annu Symp Proc 2005:1004. [PMC free article] [PubMed] [Google Scholar]
  • 25.MacLean CD, Littenberg B, Gagnon M, et al. The Vermont Diabetes Information System (VDIS): study design and subject recruitment for a cluster randomized trial of a decision support system in a regional sample of primary care practices. Clin Trials 2004;1:532–44 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Oberg R, Rasmussen L, Melski J, et al. Evaluation of the google search appliance for patient cohort discovery. AMIA Annu Symp Proc 2008:1104. [PubMed] [Google Scholar]
  • 27.Horvath MM, Winfield S, Evans S, et al. The DEDUCE Guided Query tool: Providing simplified access to clinical data for research and quality improvement. J Biomed Inform 2011;44:266–76 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Verma R, Harper J. Life cycle of a data warehousing project in healthcare. J Healthc Inf Manag 2001;15:107–17 [PubMed] [Google Scholar]
  • 29.Kuperman GJ, Teich JM, Tanasijevic MJ, et al. Improving response to critical laboratory results with automation: results of a randomized controlled trial. J Am Med Inform Assoc 1999;6:512–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tsui FC, Espino JU, Dato VM, et al. Technical description of RODS: a real-time public health surveillance system. J Am Med Inform Assoc 2003;10:399–408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Reis BY, Kirby C, Hadden LE, et al. AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc 2007;14:581–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mandl KD, Overhage JM, Wagner MM, et al. Implementing syndromic surveillance: a practical guide informed by the early experience. J Am Med Inform Assoc 2004;11:141–50 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bortis G. Experience with Mirth: An open Source Health Care Integration Engine. 2008 ICSE Experience Track on Software Engineering in Health Care, 2008. http://www.ics.uci.edu/∼gbortis/papers/gbortis_mirth2008.pdf (accessed 20 Jul 2011). [Google Scholar]
  • 34.Lowe HJ, Ferris TA, Hernandez PM, et al. STRIDE—An integrated standards-based translational research informatics platform. AMIA Annu Symp Proc 2009;2009:391–5 [PMC free article] [PubMed] [Google Scholar]
  • 35.Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010;17:124–30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Penberthy L, Brown R, Puma F, et al. Automated matching software for clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials 2010;31:207–17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Brooks CJ, Stephens JW, Price DE, et al. Use of a patient linked data warehouse to facilitate diabetes trial recruitment from primary care. Prim Care Diabetes 2009;3:245–8 [DOI] [PubMed] [Google Scholar]
  • 38.Seyfried L, Hanauer DA, Nease D, et al. Enhanced identification of eligibility for depression research using an electronic medical record search engine. Int J Med Inform 2009;78:e13–18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Morris AD, Boyle DI, MacAlpine R, et al. The diabetes audit and research in Tayside Scotland (DARTS) study: electronic record linkage to create a diabetes register. DARTS/MEMO Collaboration. BMJ 1997;315:524–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Weber S, Lowe HJ, Malunjkar S, et al. Implementing a real-time complex event stream processing system to help identify potential participants in clinical and translational research studies. AMIA Annu Symp Proc 2010;2010:472–6 [PMC free article] [PubMed] [Google Scholar]
  • 41.Nelson JL, Smith BL, Jared JD, et al. Prospective trial of real-time electronic surveillance to expedite early care of severe sepsis. Ann Emerg Med 2011;57:500–4 [DOI] [PubMed] [Google Scholar]
  • 42.Ahmad F, Gupta R, Kurz M. Real time electronic patient study enrollment system in emergency room. AMIA Annu Symp Proc 2005:881. [PMC free article] [PubMed] [Google Scholar]
  • 43.Berndt DJ, Fisher JW, Craighead JG, et al. The role of data warehousing in bioterrorism surveillance. Decis Support Syst 2007;43:1383–403 [Google Scholar]
  • 44.Santos RJ, Bernardino J. Real-time data warehouse loading methodology. IDEAS '08: Proceedings of the 2008 International Symposium on Database Engineering & Applications. New York, NY: Association for Computing Machinery, 2008. doi:10.1145/1451940.1451949 [Google Scholar]
  • 45.Agosta L, Gile K. Real-time Data Warehousing: the Hype and the Reality, 2004. http://www.forrester.com/rb/Research/real-time_data_warehousing_hype_and_reality/q/id/36076/t/2 (accessed 13 Apr 2011). [Google Scholar]

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES