Abstract
Aim to ease the secondary use of clinical data in clinical research, we introduce a metadata driven web-based clinical data management system named ClinData Express. ClinData Express is made up of two parts: 1) m-designer, a standalone software for metadata definition; 2) a web based data warehouse system for data management. With ClinData Express, what the researchers need to do is to define the metadata and data model in the m-designer. The web interface for data collection and specific database for data storage will be automatically generated. The standards used in the system and the data export modular make sure of the data reuse. The system has been tested on seven disease-data collection in Chinese and one form from dbGap. The flexibility of system makes its great potential usage in clinical research. The system is available at http://code.google.com/p/clindataexpress.
Introduction
The field of clinical research increasingly depends on the availability of complete, accurate, and aggregated clinical care data(1). Clinical care generates volumes of important data, which are critical to not only the continuity of care for the patients but also clinical research(2–4). Secondary use of daily clinical care data is necessary for quality improvement, post-market drug surveillance, clinical research, and public health(3,5).
To ease the secondary use of clinical data, it is necessary to collect data once and repopulate them many times(6). To obtain this goal, the secondary use of clinical data needs to focus on easy data access, storage, control, and share(5). Therefore, data management is a key building block in the reuse of clinical and health data throughout the national and international scope(3).
Electronic data capture, instead of paper based case report forms (CRF), is a powerful tool for quicker data capture and research efficiency(2), which has obtain great successes especially when multiple data collection sites are involved(5). It facilities data collection, reporting, query and validation, sometimes data analysis(6). Using an electronic data capture system for clinical research is prevalent. It is estimated that 41% of phase II/III/IV clinical trials were using an electronic data capture (EDC) system. Trials funded by academic institutions, government, and foundations are less likely to use an EDC system compared to those sponsored by industry. Also, larger trials tend to be more likely to adopt EDC (5). The usage of EDC technology in secondary use of data must be reasonable or attractive.
However, some important issues have not been fully addressed by EDC system, such as the clinical trial context, the user interface, the standards, data reuse and also the cost(7–11). The electronic case report forms must be designed suitably for the data collection task during the whole research and can be changed flexibly and quickly among all sites when needed. User control and freedom can provide easy interoperation between the trial and the data management system and can dramatically increase the efficiency(8,10). Standardized documentation by using some standards, such as HL7 CDA, to store the medical research data can contribute to high interoperability and an easier integration of health care information systems(1). The need for integrate clinical research data and clinical care information, or the reuse of clinical trial data can lead fully understanding of the process of some medical problems, however ’collect once, use many’ still leaves some open questions(11,12). Also cost for the data management system is still one of the problems blocking the use of electronic data capture in clinical trials, especially in small trials and some developing countries(9). So some clinical data management systems with easy user interface, data integration, data sharing and cost saving are needed for the efficient and effective clinical trial data management.
Here we introduce the ClinData Express, a metadata driven web-based clinical research data management system for secondary use of clinical data. We have applied our system in a public health and clinical research project, the results indicate that ClinData Express facilitates the data collection and management in clinical research and has a potential usage in more domains.
Features of ClinData Express
ClinData Express is developed to facilitate data collection, storage and management in clinical trials and researches. The followings are the key features identified as critical components for supporting clinical researches.
1. Flexible design of clinical case report forms
Metadata is the data element of data which is used to describe an entire data element or groups of data elements. M-designer, the data model designing module, can help researchers to define the data elements and data structures of all the data they want to collect. The usage of Common Data Element (CDE) supports the data sharing among multiple clinical studies.
2. Numerous field types and data group
In order to meet the need for HTML form building, 13field types are used, including multiple choice, boolean, integer, string and real number. Moreover, one or more data elements can be logically grouped as a data group.
3. Data validation
In ClinData Express, XForms is employed for HTML building. The data type and data range are automatically validated to decrease the data entry errors.
4. On-the-fly data query
ClinData Express is a server-side, web-based applications. The only software required is a web browser that is connected to the internet or intranet to manage the data, which keeps the installation to the minimum. When the data is entered into the system, the users can query and check it at the mean time. In addition, all the data is stored at the central database; and the data backups are done in the central system
5. Standardized documentation and sharing
Every data element in the CRFs is assigned a code of certain coding system, such as SNOMED CT, which facilities the data integration and exchange.
6. Nature langrage process (NLP) module
The NLP module can automatically extract data elements from corresponding narrative elements such as operation notes. This part is quite useful to keep compatible with HL7 CDA format.
Workflow of ClinData Express
Step 1: Define metadata for CRFs
Metadata is the description of data which describes the structure of the CRFs. At the beginning of a clinical research, the first and the only thing researchers have to do is to define the metadata for each field type and data group.
Step 2: Automatically build HTML form for CRF
ClinData Express can automatically generate the HTML form for CRFs according to the metadata defined in Step 1.
Step 3: Data management based on CRF
Now, researchers could input, modify, store and query data.
Moreover, researchers could modify data model any time during the research without information technology (IT) support. After revision, researchers just resubmitted the modified model and all existing data are transformed to new model automatically (Fig. 1).
Figure 1.
The workflow of ClinData Express. What research teams do is highlighted in the orange square.
The XForms technology used in ClinData Express make it is possible to separate data model building and data display configuration. This is very useful in the clinical trial, because the data structures and datasets (the focus of the trials) are separated from the display form design. Researchers can focus on the data collection instead of the database building and display styles.
Data structure in ClinData Express
Two essential data structures of ClinData Express are data element and data group. Data element is the minimum unit for describing one CRFs, such as the first name, the address and the telephone number of a patient. There are 13 data types in ClinData Express, which are in routine use in clinical area. Some elements used in multiple trials can be defined as common data element (CDE).
Each data element is defined by metadata, such as the type, the unique id, and is signed to a standard code id, such as HL7, SNOMED CT, for document transform and sharing. Also in ClinData Express, the NULL and N/A values are allowed. If one element has no value, it can be signed as NULL for not having a value or N/A for not knowing whether it has a value.
Data group is the logical combination of data elements, which together describe a certain item. For example, the name, age, address and telephone number can assembled as the ‘basic information group’. There are five types of groups in ClinData Express, the general group (GG), the multiple choice group (MCG), the dynamic group (DG), the dynamic table group (DTG) and the NLP group (NLPG). The GG is the basic group for the logical and somatic use. The MCG is designed for store multiple choice data. Each data element in the MCG is defined as a Boolean type. The DG is presented for a collection of elements which can be used for certain participates at least once during the clinical research, such as the follow-up information. The DTG is similar as the dynamic group, but shown in table format. The NLPG is specialized for NLP tools. In the NPL group, users can define a NLP element to store the text for NLP program and some other elements for the results of NLP.
All data is assembled in hierarchy architecture, shown in Fig. 2. Each node could be an element or a group. The data structures are stored in XML format with special tags to identify them.
Figure 2.
Data element dictionary is shown in tree hierarchy (left) and in XML format (right).
User interface
ClinData Express is made up of two parts: 1) the CRF-design module, m-designer which is a standalone software written in JAVA; 2) a web based clinical information warehouse (CIW) system. m-designer
m-designer, short for model designer, is used to define the metadata, generate the codes for data and display model and update the codes and model to the server. There are three parts in m-designer user interface (Fig. 3). Tool bar is on the top of m-designer with file, operate, edit and help tools on it. Below the tool bar, the data element tree is shown on the left, while the edit panel is shown on the right.
Figure 3.
user interface of m-designer. Screenshots:.1, the data element tree; 2, editing panel;.3 menu.
Click on each item of the data element tree to edit it, and different edit views are shown with data dictionary, data element and data group separately. When a data element is created, a group of metadata is required for description purpose. For data group the attributes like show name, tag name and data type are required, while for data element the show name, tag name, code system id, code id and element type are required. Code system id is the standard code system used in this trial, such as SNOMED CT, while code id is the code of the element in this code system.
Clindata Exprss CIW system
CIW system is a web-based clinical data management system, which has the ability to collect, modify, view, index and query the data. It is a server-side, web-based applications, which means only a web browser is required in the user side.
A quick launch tool bar is set on the left side of the main page of the CIW system (Fig. 4). The “input the new data” link provides the mechanism of entry new data into the database. Each project is accessible to users who have sufficient access privilege sets by the research team. The “view the data” link allows users to view data of each collection. The collection selected is shown in red, and each data of this collection is shown on the down right frame of the page. The “modify the data” link provides links to modify and export data of each data collection. The collections of data are shown in the right frame, while the selected one is shown in red. Researchers use the data export links followed each data collection and each data to push data outside the CE with ‘arff’ format for external data analysis. The “create index” and “data query” links provide the full-text indexing and searching for the whole data collections.
Figure 4.
Screenshots of ClinData Express CIW system. A, input new entry; B, data validation
Fig. 5 shows a typical screenshot of data entry. Clickable group tags are shown on the top of each screen, which point to the groups of the first level in the data structure hierarchy.
Figure 5.
Screenshot for ClinData Express data entry.
Each from contain field-specific validation codes to ensure strong data integration. In addition to check the element type (e.g. the birthday is defined as data type), the range of each number element (e.g. the range of age is defined from 0 to 150) is also checked. Data fields may be shown in text field (e.g. string element) or through embedded pull-down boxes (e.g. Boolean element). For the dynamic group and dynamic table group, the “Insert An Item” and/or “Delete An Item” buttons are followed by the group members; and in the NLP group, the “NLP process” button is followed by the NLP group member.
CIW system architecture
CIW system uses JSP+JavaScript programming language. The MySQL database is used for data storage and management and Apache Tomcat is used for web server as JSP compiler. There are two tables in the MySQL database, one for the basic information of the data collection, the other for the data files.
Hardware and software requirement is modest and the system runs in the Windows XP or Linux environments.
All forms are designed in the XForms technology which is the new generation of forms for the World Wide Web. XForms provides easy form design for web page because it separate data from its display style and one can use the XML file to describe the XForms. When the CRF is designed, the data model and the display model are generated automatically. The data model is stored as the XML schema according to which the data file is generated. All the data elements are assembled in the XML file format, and then this XML file is stored in the MySQL database as the special object for XML objects.
Results
The first version of ClinData Express was introduced to public at Shanghai Center for Bioinformation and Technology (SCBIT) in August 2008, Shanghai, China. The system has been tested on seven cancer surveillance data collection in Chinese and one form from dbGap. Now it has been successfully deployed in two scientific partners of SCBIT, Shanghai Municipal Centers for Disease Control & Prevention (SCDC) and Colorectal Cancer Department of Fudan University. One of the successful applications of ClinData Express is the research of liver metastasis of colorectal cancer, which involved 1125 patients. All the information, including 437 data elements, was collected and stored in ClinData Express. With the exported data in arff format, some important biomarkers were found with data mining method in WEKA environment(13).
Discussion
The most important factor that makes the successful implement of ClinData Express is the minimum requirement for IT expertise. Without help of the IT support, clinicians can do the clinical research data management, from definition of the metadata to data collection, management, especially data model modification during research. All researchers focus on data definition and data structures in the clinical research. In another word, they focus on the scientific questions. When the metadata and the data structures are defined by the research managers, they can design data model through the GUI of m-Designer which provides friendly and easy operation. When the data model is generated through m-Designer, the pages for display of these data models are also generated automatically. Then research managers distribute the data model on the ClinData Express through web, and the trial-specific database and the data manage web pages are generated simultaneously and automatically. So data collection of the research can be started at all sites. The whole process from data model definition to data collection is very easy and time-saving. Also the on-line data collection provides real-time data capture, which is very useful for collaborative, cross-organizational clinical trials and researches.
The clear definition and reuse of the clinical trial and research data is essential for data sharing between different groups within the study(14,15). Metadata is the data of data, which facilities the description and organization of data in a standard and stable way. Data can be managed, queried and understood in an easy way without the effect of change on time and technology. When metadata is introduced in clinical trial and research, the definitions of the data name and type are clearly known. And the data from one clinical research or research facility will become readable by another without any translation for the usage of the same data definition and data model.
During the multi-site clinical trial, data sharing and transforming are the common work which can leverage the single strength of every trial center. In data transformation, the standard is needed. In ClinData Express, at least one code system is needed. So that all the data used in the trail can be standardize. Among the popular standards, such as SNOMED CT, ClinData Express allows users to define their own code systems, which brings flexible process in the trial.
Several limitations should be considered about ClinData Express, such as data security. Now the system is secured by different user roles. In the future, there should be an independent security modular to make data much safer in the system.
Conclusion
We introduce a metadata driven clinical trial data management system—ClinData Express, which is a web-based integrated platform with the data sharing and data transformation abilities. Though the big clinical trial agency can afford some finical aid to develop a new data collection system for certain clinical trial, some small trial may face with the technology and finical problems. The metadata-driven database development and the automatically web-page generation in ClinData Express can surly facilitate clinical researchers to save the money and time for management and secondary use of clinical data in clinical research. Also the metadata and standards used in trials make sure the reuse and data sharing of data collected during the research.
Acknowledgments
This work is supported by grants from the National High Technology Research and Development Program of China (863 Program) (2012AA02A602) and grants from the Talents Developmental Fund of Shanghai in 2011 to ZF Li and Shanghai health bureau scientific research found project (20114182). It is also partially supported by the National Science and Technology Major Project (2012ZX09303013).
References
- 1.Klein A, T TG, Brinkmann L, Spitzer M, Ueckert F, Prokosch HU. Experiences with an Interoperable Data Acquisition Platform for Multi-Centric Research Networks Based on HL7 CDA. AMIA Annu Symp Proc; 2006. p. 986. [PMC free article] [PubMed] [Google Scholar]
- 2.Sahoo U, Bhatt A. Electronic data capture (EDC)--a new mantra for clinical trials. Qual Assur. 2003 Dec;10(3–4):117–21. doi: 10.1080/10529410390892052. [DOI] [PubMed] [Google Scholar]
- 3.Klein A, T TG, Brinkmann L, Spitzer M, Ueckert F, Prokosch HU. Experiences with an Interoperable Data Acquisition Platform for Multi-Centric Research Networks Based on HL7 CDA. AMIA Annu Symp Proc; 2006. p. 986. [PMC free article] [PubMed] [Google Scholar]
- 4.Ohmann C, Kuchinke W. Future developments of medical informatics from the viewpoint of networked clinical research. Interoperability and integration. Methods Inf Med. 2009;48(1):45–54. [PubMed] [Google Scholar]
- 5.Thwin SS, Clough-Gorr KM, McCarty MC, Lash TL, Alford SH, Buist DS, et al. Automated inter-rater reliability assessment and electronic data collection in a multi-center breast cancer study. BMC Medical Research Methodology. 2007 Jun 18;7(1):23. doi: 10.1186/1471-2288-7-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.El Emam K, Jonker E, Sampson M, Krleza-Jerić K, Neisa A. The use of electronic data capture tools in clinical trials: Web-survey of 259 Canadian trials. J. Med. Internet Res. 2009;11(1):e8. doi: 10.2196/jmir.1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.James AW. Implementation of electronic data capture systems: Barriers and solutions. Contemporary Clinical Trials. 2007;28(3):329–36. doi: 10.1016/j.cct.2007.01.001. [DOI] [PubMed] [Google Scholar]
- 8.Morak J, Schwetz V, Hayn D, Fruhwald F, Schreier G. Electronic data capture platform for clinical research based on mobile phones and near field communication technology. Conf Proc IEEE Eng Med Biol Soc. 2008;2008:5334–7. doi: 10.1109/IEMBS.2008.4650419. [DOI] [PubMed] [Google Scholar]
- 9.Fegan GW, Lang TA. Could an Open-Source Clinical Trial Data-Management System Be What We Have All Been Looking For? PLoS Med. 2008;5(3):e6. doi: 10.1371/journal.pmed.0050006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Choi B, Drozdetski S, Hackett M, Lu C, Rottenberg C, Yu L, et al. Usability comparison of three clinical trial management systems. AMIA Annu Symp Proc; 2005. p. 921. [PMC free article] [PubMed] [Google Scholar]
- 11.Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48(1):38–44. [PubMed] [Google Scholar]
- 12.Cimino JJ. Collect once, use many. Enabling the reuse of clinical data through controlled terminologies. J AHIMA. 2007 Feb;78(2):24–29. quiz 31–32. [PubMed] [Google Scholar]
- 13.Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics. 2004 Oct 12;20(15):2479–81. doi: 10.1093/bioinformatics/bth261. [DOI] [PubMed] [Google Scholar]
- 14.Crichton C, Davies J, Gibbons J, Harris S, Tsui A, Brenton J. Metadata-driven software for clinical trials. Proceedings of the 2009 ICSE Workshop on Software Engineering in Health Care; 2009. pp. 1–11. [Google Scholar]
- 15.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009 Apr;42(2):377–81. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]





