DaT3M: A Data Tracker for Multi-faceted Management of Multi-site Clinical Research Data Submission, Curation, Master Inventorying, and Sharing

Shiqiang Tao; Licong Cui; Wei-Chun Chou; Samden Lhatoo; Guo-Qiang Zhang

. 2022 May 23;2022:466–475.

DaT3M: A Data Tracker for Multi-faceted Management of Multi-site Clinical Research Data Submission, Curation, Master Inventorying, and Sharing

Shiqiang Tao ^1,³, Licong Cui ^2,³, Wei-Chun Chou ^1,³, Samden Lhatoo ^1,³, Guo-Qiang Zhang ^1,^2,³

PMCID: PMC9285149 PMID: 35854726

Abstract

Managing research data is an important and challenging aspect of clinical studies, especially for multi-site collaboratives. To address this challenge, we designed, developed and deployed a multi-faceted, multi-level interactive data tracker (DaT3M) for multi-site clinical research data submission, curation, master inventorying, and sharing. Components of DaT3M include data overview, data portal, data status panel, data query engine, and data downloader. DaT3M managed clinical research data for the Center for SUDEP Research (CSR). The CSR instance of DaT3M includes 2,743 subjects from seven data contributing institutions, 7 data modalities and 10,678 data components: 3,398 Epilepsy Monitoring Unit reports, 3,440 electroencephalography recordings, 629 MRI imaging datasets, 177 bio-chemistry datasets, 722 DNA datasets, 2,289 follow-up forms, and 30 SUDEP forms. Preliminary, structured, one-on-one usability evaluations were performed with 7 researchers from four institutions. System Usability Score reached 85.3, showing that DaT3M has achieved high levels of user satisfaction based on our pilot evaluation.

1 Introduction

Managing clinical research data following FAIR data principles^1,2 is an important and demanding task for clinical studies. The challenges involved become pronounced for large-scale multi-site collaboratives. While multi-site collaboratives bring people and resources together to provide better opportunities to solve complex research problems^3–5, they also bring new challenges for data management. Common challenges include the lack of data consistency and common data format, semantic interoperability, continuous data integration and availability, and those highlighted by FAIR⁶. Therefore, multi-site collaboratives require a modernized informatics infrastructure to manage clinical data that supports real-time data access by clinical investigators while achieving essential elements of FAIR.

In this paper, we present the design, development, and user evaluation of a Data Tracker for Multi-faceted Management of Multi-site clinical research data submission, curation, master inventorying, and sharing called DaT3M. To elevate the FAIRness of research data, we provide five main features in DaT3M: (1) a data overview interface; (2) individual center data portal; (3) a data status panel for real-time status checking of multi-modal data; (4) a data query engine to build patient cohort; and (5) a data slice downloader. The data overview interface provides a quick summary of the entire data collection in the central repository. Information such as the total number of complete datasets, the number of datasets from each individual institution, available data types in each dataset, and the total number of datasets of each data type are presented in the data overview interface. This is the most frequently used function to coordinate data integration from all institutions. The individual center data portal provides a dedicated work-space for each participating institution to manage their own contributed data. The data status panel is provided in institutional portal using data visualization techniques to track the statuses of all data types for each subject. For large research data repositories, a query engine is necessary to build patient cohorts to identify the subset of patients with specific characteristics. It is critical for researchers to be able to find subgroups of study subjects for specific hypotheses^7,8. In DaT3M, we reuse MEDCIS⁹ to build patient cohorts. The data slice downloader delivers user-specified data subsets from the central repository for secondary data analysis.

Many tools are available for research data management including Microsoft Excel, REDCap¹⁰, MACRO¹¹, Oracle Clinical¹², OpenClinica¹³ and OpenCDMS¹⁴. Although these software tools are widely used for research data management, we do not find a single tool that can address all the data FAIRness challenges faced by large-scale multi-site collaboratives.Therefore, we develop DaT3M as an integrated web application to provide customized functions to support data submission, curation, master inventorying, and sharing of large-scale multi-site collaboratives.

DaT3M has been piloted to support the data management for the Center for SUDEP Research (CSR)¹⁵, a National Institute for Neurological Disorders and Stroke (NINDS) funded Center Without Walls for Collaborative Research in the Epilepsies. DaT3M has integrated 10,678 data components of 7 different modalities for 2,743 patients from CSR’s seven data contributing institutions. The features of DaT3M makes it an ideal tool for supporting large-scale multi-site collaboratives.

2 Background

2.1 Center for SUDEP Research

CSR is a multi-site cross-disciplinary collaboration composed of researchers from 16 institutions across the United States and Europe to understand Sudden Unexpected Death in Epilepsy Patients (SUDEP). The precise mechanism of SUDEP is still unknown. And due to the low incidence rates of SUDEP (0.2 per 1,000 persons/year in children and 1.0 per 1,000 persons/year in adults)¹⁶, multi-site studies are the key to collect sufficient datasets for statistical analysis. This investment by NINDS over nearly five years promises to catalyze research on SUDEP and dramatically enhance our understanding of this devastating phenomenon. The participating institutions of CSR includes Baylor College of Medicine, University Hospitals Cleveland Medical Center (UH), Lurie Children’s Hospital of Chicago, Columbia University, Harvard University, NYU School of Medicine (NYU), Northwestern University (NW), Texas Children’s Hospital, Thomas Jefferson University (TJU), UCLA, UCSF, University College London (UCL), University of Iowa (UIowa), University of Kentucky (UKY), University of Michigan, and The University of Texas Health Science Center at Houston. CSR collects seven types of clinical data for analysis including patient reports from epilepsy monitoring units (EMUs), electroencephalography (EEG) signal data, imaging data, bio-chemistry data, DNA data, follow-up forms and SUDEP forms. UH, NYU, NW, UCLA, UCL, TJU, and UIowa are the seven CSR institutions that contribute clinical data to the central data repository. Figure 1 shows the clinical data flow from distributed institutions into the CSR central data repository using SFTP. Multi-modality clinical data are captured and uploaded from individual institutions and processed at the central data repository^9,19. Data processing tasks such as data integration, data curation, and data conversion happen at the same time handled by different working groups of CSR.

Figure 1. — Clinical research data flow in CSR.

2.2 FAIR, Rigor, and Reproducibility for multi-site Clinical Studies

Two cornerstones of science advancement, rigor in designing and performing scientific research and the ability to reproduce research findings, can be facilitated by FAIR data principles¹. FAIR requires research data to be Findable, Accessible, Interoperatable, and Reusable. Research data that meet FAIR principles have better machine-actionability and readiness for secondary analysis. Reproducibility of research findings is especially important in view of the inherent biological variations and expected technical and methodological varieties. Rapid sharing of verified raw data and associated data analytics has played a critical role for data reproducibility^17,18. It is important to establish stringent data quality control and quality assurance processes to ensure that the data generated will be broadly referenced and used by the research community. However, data management tools facilitating and enabling FAIR data principles are lacking in general, especially for complex data generated from multi-site collaboratives.

2.3 Gaps and Challenges

Gaps and challenges exist in collecting, managing, and sharing of research data in multi-site settings. One is that the traditional way of research data management heavily relies on general data management software such as spreadsheets. Such data management software is not capable enough to handle complex data management tasks such as facilitating data submission and sharing, and tracking data flow, data provenance, and data use statistics. For data sharing, the traditional way requires manual effort to build patient cohorts, prepare datasets, and distribute them to researchers when they request. The whole process is labor-intensive and error-prone. To overcome such challenges, we create an interactive research data management platform called DaT3M to automate complex tasks such as tracking data flow and provenance, building patient cohorts, and preparing datasets for sharing.

3 Methods

3.1 System Architecture

Figure 2 illustrates the system architecture of DaT3M. With MySQL²⁴ database and Ruby on Rails²⁵ application as the foundational backbone support, users can interact with DaT3M’s different function modules using web browsers. After a target patient cohort is built, users can proceed to download the data using our data downloader, which is a command line tool.

DaT3M is implemented using Ruby on Rails, an agile web application development framework with MySQL as the backend database. Its iterative agile development features allow us to rapidly respond to new requirements for data management tasks. We design our function modules following Web Interface-driven Development (WIDD)²⁶, which is an effective software development method for clinical applications by involving domain experts in the process of interface design.

3.2 Data Modeling and Visualization

Three core data models are designed in DaT3M: Patient, Data Type, and Data Status. As depicted in Figure 3, one patient can have many data types while each data type has one data status (i.e., available or not). In the current instance of DaT3M for CSR, there are seven data types: EMU reports (short name as P), EEG signal data (E), MRI imaging data (M), biochemistry data (B), DNA data (D), follow-up forms (F), and SUDEP forms (S). Different types of data are linked by the patient’s unique CSR study ID²⁰.

To provide an intuitive and concise view of patient data and their statuses, we use a colored squared box to represent each data type as well as its status. The short name of the data type is enclosed in the squared box. We use green color to indicate that the data status is available and red to convey that the data status is missing or not available. For instance, given a specific patient, a squared box with letter “P” in green indicates that the patient’s EMU reports are available. Figure 4 shows a panel of seven colored square boxes, representing the overall status of one patient dataset in CSR. The data types are evenly placed horizontally. The color of each data type indicates the availability status of the corresponding patient data. Therefore, the available data for the patient shown in Figure 4 include EMU reports, EEG signal data, MRI imaging data, follow-up forms, and DNA data, while biochemistry data and SUDEP form are not available.

Figure 4. — Patient data statuses of seven data types: P - EMU reports, E - EEG signal data, M - MRI imaging data, B - biochemistry data, D - DNA data, F - follow-up forms, S - SUDEP forms.

3.3 Data Status Tracking

To address the clinical research data management needs of cohort identification and data curation assistance, we created an efficient and powerful mechanism supporting querying patients by their data statuses. DaT3M provides a query widget (see Figure 5) with the similar appearance as the patient status panel consisting of 7 data status boxes. Importantly, the boxes in the query widget are interactive buttons that users can click to change colors to green, red, or white. White color in this query widget means the underlying data type is not used in the current query. In the query widget, the default status is optional with status box button in white; a mouse click will change the white color to green; a further mouse click will change the green color to red; and at last a click will change the red color back to white. That is, the order of color changing forms a loop: White → Green → Red → White.

Figure 5. — Filter or find patients with P (EMU reports) and E (EEG signal data) available.

Therefore, constructing a query in DaT3M is straightforward. Users can simply click on the status buttons in the query widget to configure them in desired colors. For example, one query task in CSR is to find all patients with both EMU reports and EEG data available in the central data repository. We can efficiently construct such a query as displayed in Figure 5 with two clicks by setting buttons “P” and “E” as green (i.e., available), leaving all the other buttons as white (i.e., optional). The interactive query widget is powerful since it supports various queries with respect to data statuses – a total of 2, 187 (= 3⁷) different combinations, where each combination is an independent query.

3.4 Robust Data Slice Downloader

DaT3M integrates with MEDCIS⁹ query engine which employs concepts from Epilepsy and Seizure Ontology (EpSO)²⁷ to build patient cohorts. Once a patient cohort is created, DaT3M allows end users to download the cohort data using a scalable download framework as shown in Figure 6. A patient cohort often contains thousands of data files with size in terabyte level. It is not practical to use the typical browser-based individual file downloading method to retrieve a patient cohort. To support scalable and smooth data download, we create a ruby gem called CSR Data Inventory which is a software package supporting batch-downloading of CSR patient cohort. This gem supports and has been tested on multiple operating systems (Windows, Linux, and OS X)²⁸.

There are five steps involved in the downloading workflow (see Figure 6). First, the user gets a data download token from the built patient cohort. Then the user can run DaT3M data downloader in the local machine and feed in the data download token. In step 3, a file list will be downloaded containing the catalog of all files to be downloaded. Data downloader then traverses the file list and sends request to retrieve data files using the same data download token. In the last step, files are downloaded and stored in the user’s local device. The whole workflow is highly automated. The user only needs to start the program and feed in the data download token. It is called data slice downloader because the user has a choice to select what data type to download during the step of data download token generation. Typically, researchers only need specific types of data for their study, such as EEG data for signal processing and analysis or MRI data for imaging-related investigation. Slicing data can effectively reduce the workload of data download and researchers’ local data management. Besides, the data downloader is robust since it supports resumable download. The first-downloaded file list contains meta-data about every file to be downloaded such as file name, path, and size. With this information, we can check if a data file is already downloaded completely. Once interrupted, the data downloader will skip those downloaded files and continue with the remaining data files.

3.5 Evaluation Method

To evaluate the usability of DaT3M, we invite participants who work as a researcher in the health informatics or healthcare workers that might get involved in data downloading and analysis using CSR. We design a structured one-on-one interview that consists of two sections. In the first section, each participant is instructed to complete five tasks about the system’s key features (see Table 1). The participant is asked to share screen and encouraged to think aloud when performing tasks. The interviewer observes and records the interview, and calculates the time and steps that the participant spend on each task.

Table 1.

Five usability evaluation tasks.

Task	Task Description
Task 1	Pick the UH data source and filter cases with MRI imaging
Task 2	Select one subject and explore the data details
Task 3	Build a query for subjects with generalized tonic-clonic seizure only during admission
Task 4	Generate a data download token for the query built in Task 3
Task 5	Download the data of patients found in Task 3

Open in a new tab

In the second section, each participant is provided with a 10-question survey based on the participant’s experience in the first section. Survey questions are as follows:

I think that I would like to use this system frequently.
I found the system unnecessarily complex.
I thought the system was easy to use.
I think that I would need the support of a technical person to be able to use this system.
I found the various functions in this system were well integrated.
I thought there was too much inconsistency in this system.
I would imagine that most people would learn to use this system very quickly.
I found the system very cumbersome to use.
I felt very confident using the system.
10. I needed to learn a lot of things before I could get going with this system.

We use the System Usability Scale (SUS)21 score that has a 5-point Likert scale ranging from strongly disagree (1) to strongly agree (5). The SUS score is a widely used usability evaluation questionnaire that allows us to evaluate our system with a small sample size of participants.

4 Results

4.1 Patient-level Data Availability Tracked in DaT3M

DaT3M is currently deployed at The University of Texas Health Science Center at Houston.²² Table 2 shows that as of August of 2021, DaT3M had tracked 10,678 data components for 2,741 patients for CSR. These patients are from seven collaborating institutions of CSR, including 1,086 from UH, 297 from NYU, 235 from UCLA, 450 from NW, 210 from TJU, 293 from UCL, and 170 from UIowa. It can be seen that different institutions show disparate patterns of data availability. For example, only UH, UCLA, and UCL have contributed imaging data; and bio-chemistry data only come from UH and TJU.

Table 2.

Summary statistics of query types for usability evaluation.

Center	# of Patients	EMU Reports	EEG Recordings	MRI Imaging	Bio-chemistry Data	DNA Data	Follow-up Forms	SUDEP Forms
UH	1,086	1,644	1,676	126	137	456	981	22
NW	450	504	505	0	0	7	296	1
NYU	297	288	308	0	0	124	283	1
UCLA	237	215	235	207	0	0	143	0
TJU	210	231	251	0	40	135	161	2
UCL	293	345	294	296	0	0	288	3
UIOA	170	171	171	0	0	0	137	2
Total	2,741	3,398	3,440	629	177	722	2,289	30

Open in a new tab

4.2 Data Overview Dashboard

DaT3M provides a dashboard to overview all the data availability, statistics, and distributions of CSR as shown in Figure 7. Six different interactive charts present different aspects of CSR Data. Bar chart A visualizes the same information of Table 2. Panel B lists the total number of datasets for each data type. Charts C and D depict the data distribution in terms of gender and age respectively. Pie chart E shows the number patients from each center. And pie chart F shows the total number of SUDEP cases and its percentage in the whole CSR data.

4.3 Center Data Portal

DaT3M provides a center data portal for each CSR center to track their data submission and data completeness statuses. Figure 8 shows a screenshot of the data portal of CSR center UH, consisting of four areas highlighted and labeled from 1 to 4. Area 1 shows the total number of patients from this center. Area 2 displays the number of data components of each data type. DaT3M’s data status tracking widget is labeled as area 3. Users can click on each status box to change its color and combine various data statuses as a filter to query the center’s data records shown in area 4.

5 Evaluation

Seven researchers from four different health institutes participated one-on-one interviews for the usability evaluation of DaT3M. For each task in the first section of interviews, we calculated the average time in seconds taken for the task, average steps to perform the task, number of participants who were able to complete the task with minimal required steps, and number of participants who failed in completing the task (see Table 3). In task 1, the participants were asked to use the filter function in data tracker to find patients with MRI imaging data. All the participants were able to complete this task, and three of them completed the task with the minimal required steps. In task 2, the participants were asked to explore a patient’s data records and no major issue was found. And this task has no minimum required steps to complete since it is a exploratory task. Task 3 asked the participants to use query builder to find patients with a specific type of seizure. Participants with more domain knowledge of neurology were able to complete the task faster than those with less, and 4 participants completed this task with the minimum steps. Task 4 asked the participants to generate a unique token for data download. This is a relatively simple task, and all the participants completed it without any issue. Task 5 is the most complicated task, asking participants to use the command line tool to download research data. Two participants failed this task, since their institutional IT policies did not allow installation of our data download tool, while those who succeeded in the task favored the user-friendliness of this tool.

Table 3.

Average time and steps taken for performing usability evaluation tasks.

Task	Average time (in seconds)	Average steps (minimum steps)	Participants achieved minimum steps (out of 7)	Participants failed on the task (out of 7)
1. Pick the UH data source and filter cases with MRI imaging.	59	4.0 (3)	3	0
2. Select one patient and explore the data details.	76	9.5 (-)	-	0
3. Build a query to find patients with generalized tonic-clonic seizure only during admission. 57	7.7 (7)	4	0
4. Generate a data download token for the query built in Task 3	37	4.2 (4)	4	0
5. Download the data of subjects found in Task 3.	62	6.6 (5)	1	2

Open in a new tab

Figure 9 summarizes the participants’ responses to the 10-question survey in the second section of interviews. Most people thought that the system was easy to use and they would like to use this system frequently. A final SUS score of 85.3 was achieved. Usually, a SUS score above 68 is considered acceptable, and above 85 is considered as excellent23.

6 Discussion

The primary motivation and objective of DaT3M is to promote the FAIRness of research data. DaT3M allows data to be more findable not only at the data set level, but also at data modality and subject levels. Its integrated query engine MEDCIS allows users to conveniently create cohort-level data subsets. DaT3M makes data more accessible through a streamlined downloading process. DaT3M supports data interoperability by adopting common data formats (e.g. European Data Format-EDF) and domain ontologies (e.g. EpSO). DaT3M integrates data submitted from different centers using a general common data model: patient, data type, and data status. Its data scope can be expanded to include data from additional sites. For example, consented patients in EpiToMe²⁹, our bespoke EHR for epilepsy care deployed at Memorial Hermann Hospitals, are being added to DaT3M. DaT3M provides support for data quality control and quality assurance processes through the interactive data status panel that tracks data curation workflow and data provenance. High-quality curated data is the key to achieve data reusability.

Patient cohort building and quick data sharing. DaT3M’s data sharing mechanism overcomes the traditional way of manual data preparation and distribution per request. It integrates patient cohort building, data preparation, and data downloading in a highly automated pipeline. With this pipeline, investigators with proper privileges can locate their target datasets and start to download them within a few minutes.

Evaluation feedback. It can be seen from Figure 9 that most participants share similar attitudes toward the survey questions. Most of them (6 out of 7) agreed that the system was easy to use and learn. Most of them (6 out of 7) felt confident while using the system, and they also agreed that other people could learn how to use it very quickly. From additional feedback of the survey, one of the participants thought that the system could save a lot of his time in locate the patient data he needed. Other feedback includes functional suggestions such as providing a different way of downloading data in case that the command line tool does not work, which will be addressed in our future work.

General applicability. Although DaT3M has been developed for the CSR study, its data modeling is generally applicable for other multi-site studies. Its mechanism of query by data statuses, patient cohort building, and data slice downloader can be also adapted for other studies.

Limitation. The evaluation of DaT3M are preliminary due to the small sample size of participants, although the excellent usability conclusion is valid according to the SUS method. A full-scale evaluation may need to involve crowdsourcing methods such as Amazon Mechanical Turk³⁰. Another limitation of DaT3M is the requirement of installation of the data downloader, which was prevented by one of the four evaluation institutions. Alternate data distribution mechanisms such as SFTP are needed to address such institutional policy challenges.

7 Conclusion

In this paper, we introduced a novel research data management tool – DaT3M for large-scale multi-site clinical studies. It models patient data with data modalities and statuses, designs intuitive visualizations for patient data, and automates a pipeline to integrate patient cohort building to data sharing. DaT3M has served as the data management platform for the center for SUDEP research. It facilitates data submission and sharing, tracks data flow and data provenance, and thus improves the FAIRness of CSR data. The pilot evaluation with researchers from multiple institutions indicated that DaT3M achieved highly satisfying usability.

Acknowledgement

This research was supported by the National Institutes of Health (NIH) through grants U01NS090408, U01NS090405, and R01NS116287. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Figures & Table

References

[1].Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18. Erratum in: Sci Data. 2019 Mar 19;6(1):6. PMID: 26978244. [DOI] [PMC free article] [PubMed]
[2].Krishnankutty B, Bellary S, Kumar NB, Moodahadu LS. Data management in clinical research: an overview. Indian journal of pharmacology. 2012 Mar;44(2):168. doi: 10.4103/0253-7613.93842. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Edelstein H. Collaborative research partnerships for knowledge mobilisation. Evidence & Policy: A Journal of Research, Debate and Practice. 2016 May 25;12(2):199–216. [Google Scholar]
[4].Chade DC, Shariat SF, Cronin AM, Savage CJ, Karnes RJ, Blute ML, Briganti A, Montorsi F, Van Der Poel HG, Van Poppel H, Joniau S. Salvage radical prostatectomy for radiation-recurrent prostate cancer: a multi-institutional collaboration. European urology. 2011 Aug 1;60(2):205–10. doi: 10.1016/j.eururo.2011.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Shields CJ, Tiret E, Winter DC. International Rectal Carcinoid Study Group. Carcinoid tumors of the rectum: a multi-insitutional international collaboration. Annals of surgery. 2010 Nov 1;252(5):750–5. doi: 10.1097/SLA.0b013e3181fb8df6. [DOI] [PubMed] [Google Scholar]
[6].Forjuoh SN, Helduser JW, Bolin JN, Ory MG. Challenges Associated with Multi-institutional Multi-site Clinical Trial Collaborations: Lessons from a Diabetes Self-Management Interventions Study in Primary Care. J Clin Trials. 2015;5:219. . doi: 10.4172/2167-0870.1000219. [Google Scholar]
[7].Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips LC, Gainer V, Berkowicz D, Glaser JP, Kohane I, Chueh HC. Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside. AMIA Annual Symp Proc. 2007;pp. 548-52. [PMC free article] [PubMed]
[8].Tao S, Cui L, Wu X, Zhang G. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories. AMIA Annual Symp Proc. 2017;2017:1685–1694. [PMC free article] [PubMed] [Google Scholar]
[9].Zhang GQ, Cui L, Lhatoo S, Schuele SU, Sahoo SS. MEDCIS: multi-modality epilepsy data capture and integration system. AMIA Annual Symp Proc. 2014;2014:1248–1257. [PMC free article] [PubMed] [Google Scholar]
[10].Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, McLeod L, Delacqua G, Delacqua F, Kirby J, Duda SN. The REDCap consortium: Building an international community of software platform partners. Journal of biomedical informatics. 2019 Jul 1;95:103208. doi: 10.1016/j.jbi.2019.103208. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].MACRO Clinical Data Management https://www.macro4.com/solutions/data-management/. Accessed on 2022 Jan 6.
[12].Oracle Clinical Data Management https://www.oracle.com/technical-resources/documentation/hsgbu-clinical.html. Accessed on 2022 Jan 6. [13]
[13].OpenClinica Data Management System https://www.openclinica.com/. Accessed on 2022 Jan 6.
[14].OpenCDMS https://www.opencdms.org/. Accessed on 2022 Jan 6.
[15].Center for SUDEP Research http://sudepresearch.org/. Accessed on 2022 Jan 6.
[16].Harden C, Tomson T, Gloss D, Buchhalter J, Cross JH, Donner E, French JA, Gil-Nagel A, Hesdorffer DC, Smithson WH, Spitz MC, Walczak TS, Sander JW, Ryvlin P. Practice guideline summary: Sudden unexpected death in epilepsy incidence rates and risk factors: Report of the Guideline Development, Dissemination, and Implementation Subcommittee of the American Academy of Neurology and the American Epilepsy Society. Neurology. 2017 Apr 25;88(17):1674–1680. doi: 10.1212/WNL.0000000000003685. . doi: 10.1212/WNL.0000000000003685. Erratum in: Neurology. 2019 Nov 26;93(22):982. Erratum in: Neurology. 2020 Mar 3;94(9):414. PMID: 28438841. [DOI] [PubMed] [Google Scholar]
[17].Downing SM. Reliability: on the reproducibility of assessment data. Medical education. 2004 Sep;38(9):1006–12. doi: 10.1111/j.1365-2929.2004.01932.x. [DOI] [PubMed] [Google Scholar]
[18].Miyakawa T. No raw data, no science: another possible source of the reproducibility crisis. Mol Brain 13, 24 (2020). https://doi.org/10.1186/s13041-020-0552-2. [DOI] [PMC free article] [PubMed]
[19].Sahoo SS, Zhao M, Luo L, Bozorgi A, Gupta D, Lhatoo SD, Zhang GQ. OPIC: ontology-driven patient information capturing system for epilepsy. AMIA Annual Symp Proc. 2012;2012:799–808. [PMC free article] [PubMed] [Google Scholar]
[20].Zhang GQ, Tao S, Xing G, Mozes J, Zonjy B, Lhatoo SD, Cui L. NHash: Randomized N-Gram Hashing for Distributed Generation of Validatable Unique Study Identifiers in Multicenter Research. JMIR medical informatics. 2015 Oct;3(4) doi: 10.2196/medinform.4959. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Brooke J. 1996. SUS: A quick and dirty usability scale. Usability evaluation in industry. 189-94. London: Taylor & Francis. [22]
[22]. DaT3M for the Center for SUDEP Research. https://datascience.uth.edu. Accessed on 2022 Jan 6.
[23].Bangor A, Kortum PT, Miller JT. An empirical evaluation of the system usability scale. Intl. Journal of Human-Computer Interaction. 2008 Jul 29;24(6):574–94. [Google Scholar]
[24].MYSQL Database https://dev.mysql.com/. Accessed on 2022 Jan 6.
[25].Ruby On Rails Web Development Framework http://rubyonrails.org/. Accessed on 2022 Jan 6 .
[26].Tao S, Walter BL, Gu S, Zhang GQ. Web-Interface-Driven Development for Neuro3D, a Clinical Data Capture and Decision Support System for Deep Brain Stimulation. International Conference on Health Information Science. 2016 Nov 5;pp. 31-42.
[27].Sahoo SS, Lhatoo SD, Gupta DK, Cui L, Zhao M, Jayapandian C, Bozorgi A, Zhang GQ. Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. J Am Med Inform Assoc. 2014 Jan-Feb;21(1):82–9. doi: 10.1136/amiajnl-2013-001696. . doi: 10.1136/amiajnl-2013-001696. Epub 2013 May 18. PMID: 23686934; PMCID: PMC3912711. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].CSR Data Inventory Gem https://rubygems.org/gems/csr data inventory.
[29].Tao S, Lhatoo S, Hampson J, Cui L, Zhang G. A Bespoke Electronic Health Record for Epilepsy Care (EpiToMe): Development and Qualitative Evaluation. J Med Internet Res. 2021;23(2):e22939. doi: 10.2196/22939. . PMID: 33576745. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Amazon Mechanical Turk https://www.mturk.com/. Accessed on 2022 Jan 6.

[r1-2223] [1].Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18. Erratum in: Sci Data. 2019 Mar 19;6(1):6. PMID: 26978244. [DOI] [PMC free article] [PubMed]

[r2-2223] [2].Krishnankutty B, Bellary S, Kumar NB, Moodahadu LS. Data management in clinical research: an overview. Indian journal of pharmacology. 2012 Mar;44(2):168. doi: 10.4103/0253-7613.93842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3-2223] [3].Edelstein H. Collaborative research partnerships for knowledge mobilisation. Evidence & Policy: A Journal of Research, Debate and Practice. 2016 May 25;12(2):199–216. [Google Scholar]

[r4-2223] [4].Chade DC, Shariat SF, Cronin AM, Savage CJ, Karnes RJ, Blute ML, Briganti A, Montorsi F, Van Der Poel HG, Van Poppel H, Joniau S. Salvage radical prostatectomy for radiation-recurrent prostate cancer: a multi-institutional collaboration. European urology. 2011 Aug 1;60(2):205–10. doi: 10.1016/j.eururo.2011.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-2223] [5].Shields CJ, Tiret E, Winter DC. International Rectal Carcinoid Study Group. Carcinoid tumors of the rectum: a multi-insitutional international collaboration. Annals of surgery. 2010 Nov 1;252(5):750–5. doi: 10.1097/SLA.0b013e3181fb8df6. [DOI] [PubMed] [Google Scholar]

[r6-2223] [6].Forjuoh SN, Helduser JW, Bolin JN, Ory MG. Challenges Associated with Multi-institutional Multi-site Clinical Trial Collaborations: Lessons from a Diabetes Self-Management Interventions Study in Primary Care. J Clin Trials. 2015;5:219. . doi: 10.4172/2167-0870.1000219. [Google Scholar]

[r7-2223] [7].Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips LC, Gainer V, Berkowicz D, Glaser JP, Kohane I, Chueh HC. Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside. AMIA Annual Symp Proc. 2007;pp. 548-52. [PMC free article] [PubMed]

[r8-2223] [8].Tao S, Cui L, Wu X, Zhang G. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories. AMIA Annual Symp Proc. 2017;2017:1685–1694. [PMC free article] [PubMed] [Google Scholar]

[r9-2223] [9].Zhang GQ, Cui L, Lhatoo S, Schuele SU, Sahoo SS. MEDCIS: multi-modality epilepsy data capture and integration system. AMIA Annual Symp Proc. 2014;2014:1248–1257. [PMC free article] [PubMed] [Google Scholar]

[r10-2223] [10].Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, McLeod L, Delacqua G, Delacqua F, Kirby J, Duda SN. The REDCap consortium: Building an international community of software platform partners. Journal of biomedical informatics. 2019 Jul 1;95:103208. doi: 10.1016/j.jbi.2019.103208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11-2223] [11].MACRO Clinical Data Management https://www.macro4.com/solutions/data-management/. Accessed on 2022 Jan 6.

[r12-2223] [12].Oracle Clinical Data Management https://www.oracle.com/technical-resources/documentation/hsgbu-clinical.html. Accessed on 2022 Jan 6. [13]

[r13-2223] [13].OpenClinica Data Management System https://www.openclinica.com/. Accessed on 2022 Jan 6.

[r14-2223] [14].OpenCDMS https://www.opencdms.org/. Accessed on 2022 Jan 6.

[r15-2223] [15].Center for SUDEP Research http://sudepresearch.org/. Accessed on 2022 Jan 6.

[r16-2223] [16].Harden C, Tomson T, Gloss D, Buchhalter J, Cross JH, Donner E, French JA, Gil-Nagel A, Hesdorffer DC, Smithson WH, Spitz MC, Walczak TS, Sander JW, Ryvlin P. Practice guideline summary: Sudden unexpected death in epilepsy incidence rates and risk factors: Report of the Guideline Development, Dissemination, and Implementation Subcommittee of the American Academy of Neurology and the American Epilepsy Society. Neurology. 2017 Apr 25;88(17):1674–1680. doi: 10.1212/WNL.0000000000003685. . doi: 10.1212/WNL.0000000000003685. Erratum in: Neurology. 2019 Nov 26;93(22):982. Erratum in: Neurology. 2020 Mar 3;94(9):414. PMID: 28438841. [DOI] [PubMed] [Google Scholar]

[r17-2223] [17].Downing SM. Reliability: on the reproducibility of assessment data. Medical education. 2004 Sep;38(9):1006–12. doi: 10.1111/j.1365-2929.2004.01932.x. [DOI] [PubMed] [Google Scholar]

[r18-2223] [18].Miyakawa T. No raw data, no science: another possible source of the reproducibility crisis. Mol Brain 13, 24 (2020). https://doi.org/10.1186/s13041-020-0552-2. [DOI] [PMC free article] [PubMed]

[r19-2223] [19].Sahoo SS, Zhao M, Luo L, Bozorgi A, Gupta D, Lhatoo SD, Zhang GQ. OPIC: ontology-driven patient information capturing system for epilepsy. AMIA Annual Symp Proc. 2012;2012:799–808. [PMC free article] [PubMed] [Google Scholar]

[r20-2223] [20].Zhang GQ, Tao S, Xing G, Mozes J, Zonjy B, Lhatoo SD, Cui L. NHash: Randomized N-Gram Hashing for Distributed Generation of Validatable Unique Study Identifiers in Multicenter Research. JMIR medical informatics. 2015 Oct;3(4) doi: 10.2196/medinform.4959. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21-2223] [21].Brooke J. 1996. SUS: A quick and dirty usability scale. Usability evaluation in industry. 189-94. London: Taylor & Francis. [22]

[r22-2223] [22]. DaT3M for the Center for SUDEP Research. https://datascience.uth.edu. Accessed on 2022 Jan 6.

[r23-2223] [23].Bangor A, Kortum PT, Miller JT. An empirical evaluation of the system usability scale. Intl. Journal of Human-Computer Interaction. 2008 Jul 29;24(6):574–94. [Google Scholar]

[r24-2223] [24].MYSQL Database https://dev.mysql.com/. Accessed on 2022 Jan 6.

[r25-2223] [25].Ruby On Rails Web Development Framework http://rubyonrails.org/. Accessed on 2022 Jan 6 .

[r26-2223] [26].Tao S, Walter BL, Gu S, Zhang GQ. Web-Interface-Driven Development for Neuro3D, a Clinical Data Capture and Decision Support System for Deep Brain Stimulation. International Conference on Health Information Science. 2016 Nov 5;pp. 31-42.

[r27-2223] [27].Sahoo SS, Lhatoo SD, Gupta DK, Cui L, Zhao M, Jayapandian C, Bozorgi A, Zhang GQ. Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. J Am Med Inform Assoc. 2014 Jan-Feb;21(1):82–9. doi: 10.1136/amiajnl-2013-001696. . doi: 10.1136/amiajnl-2013-001696. Epub 2013 May 18. PMID: 23686934; PMCID: PMC3912711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28-2223] [28].CSR Data Inventory Gem https://rubygems.org/gems/csr data inventory.

[r29-2223] [29].Tao S, Lhatoo S, Hampson J, Cui L, Zhang G. A Bespoke Electronic Health Record for Epilepsy Care (EpiToMe): Development and Qualitative Evaluation. J Med Internet Res. 2021;23(2):e22939. doi: 10.2196/22939. . PMID: 33576745. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30-2223] [30].Amazon Mechanical Turk https://www.mturk.com/. Accessed on 2022 Jan 6.

PERMALINK

DaT3M: A Data Tracker for Multi-faceted Management of Multi-site Clinical Research Data Submission, Curation, Master Inventorying, and Sharing

Shiqiang Tao, PhD

Licong Cui, PhD

Wei-Chun Chou, MS

Samden Lhatoo, MD

Guo-Qiang Zhang, PhD

Abstract

1 Introduction

2 Background

2.1 Center for SUDEP Research

Figure 1.

2.2 FAIR, Rigor, and Reproducibility for multi-site Clinical Studies

2.3 Gaps and Challenges

3 Methods

3.1 System Architecture

Figure 2.

3.2 Data Modeling and Visualization

Figure 3.

Figure 4.

3.3 Data Status Tracking

Figure 5.

3.4 Robust Data Slice Downloader

Figure 6.

3.5 Evaluation Method

Table 1.

4 Results

4.1 Patient-level Data Availability Tracked in DaT3M

Table 2.

4.2 Data Overview Dashboard

Figure 7.

4.3 Center Data Portal

Figure 8.

5 Evaluation

Table 3.

Figure 9.

6 Discussion

7 Conclusion

Acknowledgement

Figures & Table

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases