Abstract
Insight is a Semantic Web technology-based platform to support large-scale secondary analysis of healthcare data for neurology clinical research. Insight features the novel use of: (1) provenance metadata, which describes the history or origin of patient data, in clinical research analysis, and (2) support for patient cohort queries across multiple institutions conducting research in epilepsy, which is the one of the most common neurological disorders affecting 50 million persons worldwide. Insight is being developed as a healthcare informatics infrastructure to support a national network of eight epilepsy research centers across the U.S. funded by the U.S. Centers for Disease Control and Prevention (CDC). This paper describes the use of the World Wide Web Consortium (W3C) PROV recommendation for provenance metadata that allows researchers to create patient cohorts based on the provenance of the research studies. In addition, the paper describes the use of descriptive logic-based OWL2 epilepsy ontology for cohort queries with “expansion of query expression” using ontology reasoning. Finally, the evaluation results for the data integration and query performance are described using data from three research studies with 180 epilepsy patients. The experiment results demonstrate that Insight is a scalable approach to use Semantic provenance metadata for context-based data analysis in healthcare informatics.
Keywords: Provenance metadata, Ontology streaming, Semantic integration of healthcare data
I. Introduction
The recently announced Brain Research through Advancing Innovative Neurotechnologies (BRAIN) research initiative [1] is an ambitious vision to comprehensively understand the human brain and it is similar in scope to the Human Genome Project (HGP) [2]. However, sophisticated experimental technologies are generating new kinds of neurological datasets at increasing rate and in greater volumes, which pose a significant challenge to the objectives of the BRAIN initiative [3]. This requires the development of new data integration and analysis platforms that can adapt and scale with datasets across multiple research studies, neurotechnologies, and geographically distributed research centers. These data management platforms also need to represent complex multi-dimensional (e.g., temporal and spatial) and multi-modal information (e.g., magnetic resonance imaging, electroencephalogram signals) at different levels of granularity through use of knowledge representation techniques.
Existing data integration approaches in biomedical informatics are primarily focused on the “bench side” (e.g., proteomics and genomics) with limited work on addressing the requirements of the “bedside” (e.g., clinical and healthcare research). Hence, a vast majority of clinical research studies are currently undertaken using paper-based data entry forms, MS Excel or MS Access database for data management [4]. In addition, researchers need to use critical provenance metadata, which describes the origin or history of research data, to ensure data quality and scientific reproducibility.
There are only a few clinical research systems that use Semantic Web technologies for data management, for example the Informatics for Integrating Biology and the Bedside (I2B2) is a large-scale initiative to integrate and analyze electronic health records (EHR) for clinical research [12]. The I2B2 uses a traditional data warehouse approach with Extract Transform Load (ETL) process to integrate EHR data and uses multiple ontologies, such as components of the International Classification of Diseases (ICD-9). Recently, I2B2 has developed an API to access biomedical ontologies from the National Center for Biomedical Ontologies (NCBO) using the NCBO REST API. Similarly, the Strategic Health IT Research Project for Secondary Use of Electronic Health Record (SHARPn) is a National Institutes of Health (NIH) funded project to use natural language processing technique to extract structured data from EHR systems [13]. SHAPRn uses ontologies together with the clinical Text Analysis and Knowledge Extraction System (cTAKES), which is based on the IBM Unstructured Information Management Architecture (UIMA) pipeline, for named entity recognition (NER), negation detection, and inferring temporal values.
The Insight clinical research platform addresses many of these requirements by using Semantic Web technologies for representing and analyzing provenance metadata and epilepsy domain knowledge. The platform leverages the World Wide Web Consortium (W3C) recommendations, such as PROV (for provenance modeling) and OWL2 (Web Ontology Language) for data integration and cohort queries. The unique features of Insight include the use of semantic provenance metadata and ontology-based subsumption reasoning to support context-based cohort queries [5] across multiple studies. The Insight platform is being developed as part of a healthcare informatics infrastructure for a national network of neurological disorder research centers funded by the U.S. Centers for Disease Control and Prevention (CDC).
A. Advancing Research in Neurological Disorders: The Managing Epilepsy Well (MEW) Network
The MEW Network consists of eight research centers across the US that aim to develop techniques that can enable persons with epilepsy to improve their ability to manage their epilepsy. Epilepsy is one of the most common neurological disorders, affecting approximately 65 million persons worldwide with 200,000 new cases diagnosed each year in the US alone [6]. Patients with epilepsy may suffer from repeated seizures, manifested as physical or behavioral changes, which leads to significantly increased risk of injury or mortality due to uncontrolled seizures [7]. Despite advances in treatment approaches and medication, persons with epilepsy, especially in the lower socio-economic strata, often have poor quality of life and are at risk of negative health outcomes [8]. In addition to the daily challenges faced by epilepsy patients and families, the wider community also is affected in terms of lost productivity and direct medical costs [9]. Effective management of epilepsy requires the involvement of clinicians and the active participation patients, as well as members of the immediate community, to develop “epilepsy self-management” techniques. These techniques are expected to improve outcomes for persons with epilepsy. The MEW Network research centers have developed several successful self-management approaches [10] [11] to address the many challenges faced by persons with epilepsy. However, due to data heterogeneity and lack of effective data integration tools, there have been limited pooled or secondary data analyses that could only be conducted with larger samples or aggregate data. A recent survey of the MEW Network centers identified significant challenges that prevent data integration and analysis [4], including:
Use of multiple measures for same information. The centers used a wide variety of measures for collecting data about the same information elements, for example 10 different measures were used for collecting the cognitive functions of a patient and 5 separate measures were used for depression [4].
Use of disparate values for same measures. Even when the centers use the same measures they assigned disparate values to the measures. For example, to collect data about the frequency of seizures in patients the centers collected data about the past 30 days or 4 weeks or 4 months or 12 months.
Lack of support for provenance metadata. The research study protocols describe the provenance of the data, for example specific age group and income status of the study participants. These study provenance metadata is critical for researchers to ensure that healthcare data from different centers are comparable and validate the quality of data.
II. Integrating and Analyzing Healthcare Data in the Mew Network: Role of Domain and Provenance Ontology
The primary aim of the Insight platform is to leverage the large amount of epilepsy and self-management data collected since 2007 to support: (a) Cross-cohort queries to test new research hypothesis, and (b) Guide the development of new intervention techniques for managing epilepsy.
A. Insight System Architecture and Data Access
The research study data imported and integrated into the Insight platform are de-identified to remove any Protected Health Information (PHI) of the participants. In addition, access to the data is governed by the MEW Network informatics consortium, which consists of the eight MEW Network centers. At present, approximately eight study data have either been made available or are in the process to be added to the Insight platform. Each site completes a Data Access and Use Agreement (DAUA) that governs the use of the de-identified data for clinical research studies within the MEW Network.
The DAUA and review of the proposed research hypothesis by the MEW Network members ensures that the study data is used only for peer-reviewed scientific research and the results are made available to all members of the community. The most important challenge faced during the development of Insight was data heterogeneity due to the use of different data elements and values. For example, the Targeted Illness Management for Epilepsy and Mental Illness (TIME) project conducted by the MEW center at the Case Western Reserve University (CWRU), categorized participant income as: (a) “less than $25,000”, (b) “$25,000 to $50,000”, and (c) “more than $50,000”. In contrast to the TIME study, the Web Epilepsy Awareness, Support, and Education (WebEase) project conducted at Emory University, categorized income as “less than $20,000” and “less than $25,000”.
Hence, it is extremely difficult to use automated data integration approaches to map income data categorized as “less than $25,000” in the TIME study with income data in the WebEase project with value “less than $25,000” or “less than $20,000”. To address this and similar data heterogeneity challenges, the Ontology Layer of Insight used a list of “common data elements” (CDE), which was collaboratively developed to represent the study data. The CDEs used terms defined in existing terminological systems, such as the Institute of Medicine's Report on “Epilepsy Across the Spectrum: Promoting Health and Understanding 2012,” the recommended standards for epilepsy surveillance studies, the National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements (CDE), and the Behavioral Risk Factor Surveillance System (BRFSS) Questionnaire [12]. The MEW Network CDEs were modeled in an ontology by extending an existing domain ontology for epilepsy research called the Epilepsy and Seizure Ontology (EpSO) [12].
B. Epilepsy and Seizure Ontology: A Common Semantic Framework for Epilepsy Research
The Epilepsy and Seizure Ontology (EpSO) was developed using OWL2 to serve as a formal knowledge model for the epilepsy research community and support a variety of software platforms, such as managing patient information in epilepsy monitoring units [13], processing clinical free text [14], and managing multi-modal EEG and electrocardiogram (ECG) data [15]. EpSO1 models terms described by the “four-dimensional classification” approach, including seizures (representing abnormal electrical activity in the brain), the location of seizures, the cause of the epilepsy, and related medical conditions [12]. It re-uses ontology concepts defined in the Foundational Model of Anatomy (FMA) [16] for modeling detailed brain anatomy, drug information (brand names and drug ingredients) from the RxNorm (developed at the US National Library of Medicine (NLM) [17], and evoked potentials from the Neural Electromagnetic Ontologies (NEMO) [18].
Hence, it was intuitive to extend EpSO to model the MEW Network CDEs that will also support interoperability between the data generated in hospital environment, such as the epilepsy-monitoring unit, and community-based MEW Network research studies. The extension of EpSO for the MEW Network incorporate terms to describe socioeconomic measures (e.g., employment status, education and household status), mental health status measures (e.g., Patient Health Questionnaire (PHQ-9) and Quality of Life (QOLIE-1O), and health status. In addition, the detailed classification of epilepsy syndromes in EpSO is re-used for representing the seizure related information collected in the MEW Network.
The EpSO classes describe various features of epilepsy syndromes that are used as reference by researchers to correlate various attributes, such as symptoms (e.g., EEG feature EpSO:Spike or EpSO:PolySpike) and preferred medication (e.g., EpSO:Lamotrigine or EpSO:Zonisamide). EpSO refers to the http://www.case.edu/EpSO.owl# namespace. The EpSO classes underpin the data transformation workflow of the Insight platform using mappings between the research study data and the MEW Network CDEs (modeled as EpSO classes). However, each MEW research study is conducted using different study protocols, for example each study uses specific “inclusion criteria” and “exclusion criteria” to recruit participants.
These study-specific inclusion and exclusion criteria constitute the provenance metadata of each research study and are essential for: (1) accurate comparison of data across different studies, (2) avoid incorrect inference from data analysis due to study-specific bias (e.g., the TIME study enrolled epilepsy patients who also suffered from severe mental illness also). Hence, EpSO imported and extended ontology classes from the W3C PROV Ontology (PROV-O) [19] to model study provenance information.
C. W3C PROV Ontology: Modeling Clinical Research Provenance Metadata
Provenance metadata describes the origin or history of data and it is essential to understand the context of research data and support scientific reproducibility [20]. The W3C PROV specifications were developed to serve as a reference model for representing provenance information and consist of the PROV Data Model (PROV-DM) [21], the PROV Ontology PROV-O (modeled using OWL2), and the PROV Constraints [22]. Specifically, PROV-O consists of a core set of classes and properties that can be extended to model domain-specific provenance information, which aims to support greater interoperability between provenance-enabled data management systems. The three core classes of PROV-O, namely prov:Entity, prov:Activity, and prov:Agent, were used in the extended version of EpSO to model the provenance metadata of the MEW Network research studies. prov refers to the http://www.w3.org/ns/prov# namespace.
Research studies recruit participants based on specific inclusion criteria, which describe the set of requirements a person needs to satisfy to qualify for inclusion in the study, and exclusion criteria, which are attributes that disqualifies a person from participating in a given study. For example, both TIME and WebEase studies specify that a person should be “older than 18 years of age” with a diagnosis of epilepsy as inclusion criteria. Similarly, the TIME specifies dementia (decline in mental abilities that restricts daily functioning) as an exclusion criterion. Together, the inclusion and exclusion criteria describe a provenance context for integrating data, supporting cohort queries, and data analysis across multiple research studies. We classified provenance of the MEW Network research studies into three categories:
Demography and socio-economic criteria (e.g., older than 18 years of age)
Clinical diagnosis criteria (e.g., diagnosed with mental illness)
Physiological and mental capability criteria (e.g., no cognitive limitation for communication)
The provenance metadata corresponding to these three categories were modeled using OWL2 existential restrictions on EpSO object properties [5]. In the next section, we describe how the extended EpSO ontology with provenance metadata is used to support various types of user queries in the Insight platform.
III. Provenance-enabled cohort Queries and Data Analysis using Epilepsy Ontology
The visual query interface allows healthcare researchers to follow a two-phase data query and retrieval workflow that is supported by provenance metadata and OWL2 ontology reasoning over the EpSO class hierarchy [5]. In the following sections, we describe these two phases in detail.
A. Provenance-based study selection
Before creating a study cohort, researchers first identify the appropriate studies based on their inclusion and exclusion criteria, which are modeled as provenance metadata in Insight. Although there has been lot of work in developing eligibility criteria ontology for clinical trials, for example the Eligibility Rule Grammar and Ontology (ERGO) [23], there is no existing approach to model provenance of research studies. Figure 1 illustrates the procedure to interactively select the inclusion and exclusion criteria by the user to identify appropriate research studies that satisfy a specific context. In our earlier work, we have defined the notion of “provenance context” to prune the result space for provenance-based queries over large RDF datasets [24]. A detailed performance evaluation of provenance query performance is presented in Section IV.
Figure 1. Use of provenance metadata representing inclusion and exclusion criteria to select MEW research study.

B. Creating clinical study cohort: Ontology reasoning for supporting query expansion and execution
After the provenance-based study selection, the researchers select a cohort of patients to test their research hypothesis by identifying a set of desired characteristics of the patients. For example, to identify the prevalence of co-occurring mental illness, such as depression, in different sub-populations of epilepsy patients they can create a cohort of patients with young adults and the elderly. Similarly, researchers can also create a cohort to identify the causal relationship between the quality of life and the socio-economic conditions of patients together with the frequency of epilepsy seizures.
After the researchers identify appropriate research studies (described above), they can interactively create a cohort query by first selecting the query variable from the list of MEW CDEs and then selecting the specific values corresponding to the selected query variable (Figure 2). The user can interactively modify both the query variables, that is, MEW CDEs and the values associated with query variables. Before executing the user-defined query, the Insight query module uses OWL2 reasoning (based on description logic) to “expand the query expression” to improve the quality of results. For example, if the user selects “Autonomic Seizure” as a query constraint, the query module automatically includes all the subtypes of “Autonomic Seizure”, such as “Vasomotor Seizure” and “Bradycardic Seizure”, in the query expression. Hence, this ontology-driven approach overcomes the limitations of purely syntactic querying with Semantic Web technology and the resulting study cohort includes all patients who suffer from any category of “Autonomic Seizure”.
Figure 2. Cohort query using MEW CDEs as query variables.

C. Data Integration and Role-Based Access Control
The Insight data transformation and integration layer uses an ETL approach together with manually created semantic mappings between the research study variables and the MEW CDEs (modeled as EpSO classes). The mappings are maintained manually to ensure consistency with the evolving EpSO classes and if the mappings become invalid due to changes in the structure of EpSO, they are updated. In consultation with the researchers of the Neurological Institute (co-authors on this paper). The Insight platform uses a role-based access control (RBAC) to ensure that only users approved by the MEW Network centers consortium can access the selected study datasets for approved research projects. Evaluations Results
This section focuses on evaluating the performance of the Insight platform in terms of data transformation and loading as well as query execution using provenance metadata (inclusion/exclusion criteria) and MEW CDEs. Using a traditional data complexity and query expression complexity approach [25], the evaluations were performed on a server with Intel(R) Xeon(R) 2.2GHz processor with 6 cores, 8GB RAM and 1 TB SAT A hard drive rated at 7200RPM. Each experiment result is an average of 5 consecutive executions with the first value recorded with cold cache.
A. Scalability Performance Evaluation Results: Provenance-based Study Selection
Figure 3 illustrates the performance of the provenance queries to select research studies using only inclusion criteria (Figure 3) and using both inclusion and exclusion criteria (Figure 3) as the number of research studies are increased to demonstrate the scalability of Insight with volume of data. As expected, the use of both inclusion and exclusion criteria increases the complexity of the query expression with greater number of query variables. The default configuration in Insight uses the logical disjunction (OR) as connective between all query variables for same type of provenance metadata (inclusion as well as exclusion) and logical conjunction (AND) connective to compute the final set of research studies.
Figure 3. Performance evaluation for provenance queries to select MEW research studies.

B. Scalability Performance Evaluation Results: Building Research Cohorts with MEW Data Elements
Once a set of studies is selected using provenance queries, researchers construct study cohorts using the MEW CDEs (modeled as EpSO classes) as described earlier in Section IILA. Figure 4(a) demonstrates the performance of Insight using single value per MEW CDE (e.g., African American for CDE Race). The computation time increases with increasing number of CDEs used in the query expression. Figure 4(a) also shows that the computation time is related to the complexity of the query expression and it is not significantly affected as the number of studies increases from 1 to 3 (leading to increase in total volume of data queried by the system).
Figure 4. Performance of study cohort queries with increasing number of MEW CDEs (reflecting query expression complexity) and increasing number of research studies.

Similarly, the query performance for increasing complexity of query expression with multiple values for each CDE demonstrates the scalability of Insight as the computation time for both single valued CDE and multivalued CDE are comparable (Figure 4(a) and Figure 4 (b)). Although the query performance for a single research study is recorded to be slightly lower for 3 studies as compared to a single study with the query expression consisting of multiple values for each CDE, we believe it does not reflect any systemic factor associated with query execution in the system. It is potentially an artifact. The default configuration of Insight assigns logical disjunction (OR) between all variables of query expression based on the suggestion of the clinical researchers. The primary intuition for this choice is that a researcher can fine-tune the composition of a study cohort interactively in the query environment by modifying the value of the query variables and by explicitly applying the logical conjunction (AND) between query variables to reduce the size of the cohort according to their requirements.
The results from the three experiments demonstrate that the most significant component of the computational time for query execution is the complexity of the query expression. In addition to query execution, the total time for data transformation and loading needs to be reduced to allow the Insight platform to scale with streaming data from active MEW research studies. We discuss these issues together with other research challenges that are expected to arise in the next phase of development.
IV. Discussion and Lessons Learnt
Although the scalability of the current implementation of the Insight platform is adequate for limited study datasets, we plan to use a combination of materialized view and indexing approaches to further improve the performance of user queries. In the Insight platform, the results of high frequency cohort queries can be materialized to improve performance of subsequent queries. The materialization approach has been used in the streaming data community to support queries on past snapshots of data, which will be applicable to the query streaming data from ongoing MEW research studies. Similarly, multi-level indexes can be defined for the mental health and demography related MEW CDEs, which will enable faster lookup of values corresponding to these variables.
In addition to query execution, we have implemented a pilot project to use the well-known Hadoop MapReduce programming approach to efficiently load neuroscience data into Hadoop Distributed File System (HDFS). This approach can be used to process and load MEW research study data in parallel while reducing the time required for data transformation. The modular architecture of the Insight platform will allow us to easily replace the existing data integration and transformation layer with more efficient implementation.
V. Conclusion
The Insight platform is an innovative approach to combine semantic provenance with ontology-based reasoning to allow integration of complex healthcare data and support secondary analysis to create study cohorts across multicenter research studies. Insight aims to advance clinical research in epilepsy, which is the most common serious neurological disorder affecting 65 million persons worldwide, with access to large knowledge base created from multiple MEW Network research studies. The Insight platform consists of four components for integrating heterogeneous data from multiple MEW research studies using epilepsy domain ontology called EpSO and provenance metadata (modeled using the W3C PROV ontology). The performance evaluation of the data transformation, loading, and query performance modules demonstrates that Insight is a scalable healthcare informatics platform that supports context-based cohort queries.
Acknowledgments
This work is supported in part by the Centers for Disease Control and Prevention (CDC) (grant# SPN00165), and the National Institutes of Biomedical Imaging and Bioengineering (NIBIB) Big Data to Knowledge (BD2K) grant (1U01EB020955).
Footnotes
The EpSO OWL file is available at: www.mew.meds.cwru.edu/insight/epso
References
- 1.House TW, editor. Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Washington, D.C.: 2013. [Google Scholar]
- 2.Human Genome Project (HGP) Available: http://www.oml.gov/sci/techresources/Human_Genome/home.shtml (retrieved on September 29, 2015)
- 3.Jorgenson LA, Newsome WT, et al. The BRAIN Initiative: developing technology to catalyse neuroscience discovery. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences. 2015;370 doi: 10.1098/rstb.2014.0164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sahoo SS, Zhang GQ, et al. Managing information well: Toward an ontology-driven informatics platform for data sharing and secondary use in epilepsy self-management research centers. Health Informatics Journal. 2015:1–14. doi: 10.1177/1460458215572924. [DOI] [PubMed] [Google Scholar]
- 5.Hitzler P, Krötzsch M, et al. OWL 2 Web Ontology Language Primer. World Wide Web Consortium W3C Recommendation. 2009 [Google Scholar]
- 6.Epilepsy Foundation. Available: http://www.epilepsyfoundation.org/aboutepilepsy/whatisepilepsy/statistics.cfm (retrieved on September 29, 2015)
- 7.Forsgren L, Hauser WA, et al. Mortality of epilepsy in developed countries: a review. Epilepsia. 2005;46:18–27. doi: 10.1111/j.1528-1167.2005.00403.x. [DOI] [PubMed] [Google Scholar]
- 8.Hesdorffer DC, Begley CE. Surveillance of epilepsy and prevention of epilepsy and its sequelae: lessons from the Institute of Medicine report. Curr Opin Neurol. 2013;26:168–73. doi: 10.1097/WCO.0b013e32835ef2c7. [DOI] [PubMed] [Google Scholar]
- 9.CDC. Behavioral Risk Factor Surveillance System (BRFSS) 2012 Available: http://www.cdc.gov/brfss/about/brfss_today.htm.
- 10.Ciechanowski P, Chaytor N, et al. PEARLS depression treatment for individuals with epilepsy: a randomized controlled trial. Epilepsy & Behavior. 2010;19:225–31. doi: 10.1016/j.yebeh.2010.06.003. [DOI] [PubMed] [Google Scholar]
- 11.Dilorio C, Escoffery C, et al. WebEase: development of a Web-based epilepsy self-management intervention. Preventing chronic illness. 2009;6:A28. [PMC free article] [PubMed] [Google Scholar]
- 12.Sahoo SS, Lhatoo SD, et al. Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. Journal of American Medical Informatics Association. 2014;21:82–9. doi: 10.1136/amiajnl-2013-001696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sahoo SS, Zhao M, et al. OPIC: Ontology-driven Patient Information Capturing System for Epilepsy, in the. American Medical Informatics Association (AMIA) Annual Symposium; Chicago. 2012. pp. 799–808. [PMC free article] [PubMed] [Google Scholar]
- 14.Cui L, Bozorgi A, et al. EpiDEA: Extracting Structured Epilepsy and Seizure Information from Patient Discharge Summaries for Cohort Identification. the American Medical Informatics Association (AMIA) Annual Symposium; Chicago. 2012. pp. 1191–1200. [PMC free article] [PubMed] [Google Scholar]
- 15.Jayapandian CP, Chen CH, et al. The 14th World Congress on Medical and Health Informatics (MedInfo) Copenhagen, Denmark: 2013. Electrophysiological Signal Analysis and Visualization using Cloudwave for Epilepsy Clinical Research; pp. 817–21. [PMC free article] [PubMed] [Google Scholar]
- 16.Rosse C, Mejino JL., Jr A reference ontology for biomedical informatics: the Foundational Model of Anatomy. Journal of Biomedical Informatics. 2003;36:478–500. doi: 10.1016/j.jbi.2003.11.007. [DOI] [PubMed] [Google Scholar]
- 17.Nelson SJ, Zeng K, et al. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011;18:441–8. doi: 10.1136/amiajnl-2011-000116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dou D, Frishkoff G, et al. Development of NeuroElectroMagnetic Ontologies (NEMO): A framework for mining brain wave ontologies. Thirteenth International Conference on Knowledge Discovery and Data Mining (KDD2007); San Hose, CA. 2007. pp. 270–279. [Google Scholar]
- 19.Lebo T, Sahoo SS, et al. PROV-O: The PROV Ontology. World Wide Web Consortium W3C2013 [Google Scholar]
- 20.Gil Y, Cheney J, et al. Provenance xg final report. W3C2010 [Google Scholar]
- 21.Moreau L, Missier P. PROV Data Model (PROV-DM) World Wide Web Consortium W3C2013 [Google Scholar]
- 22.Cheney J, Missier P, et al. Constraints of the PROV Data Model. World Wide Web Consortium W3C2013 [Google Scholar]
- 23.Weng C, Tu SW, et al. Formal representation of eligibility criteria: a literature review. J Biomed Inform. 2010;43:451–67. doi: 10.1016/j.jbi.2009.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sahoo SS, Bodenreider O, et al. Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data. 22nd International Conference on Scientific and Statistical Database Management (SSDBM); Heidelberg, Germany. 2010. pp. 461–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Vardi M. The Complexity of Relational Query Languages. 14th Ann ACM Symp Theory of Computing (STOC ′82) 1982:137–146. [Google Scholar]
