Abstract
Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary’s metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized archetypes can help to overcome this problem. The Archetype Modeling Language (AML) as developed by the Clinical Information Modeling Initiative (CIMI) can serve as a common format for the representation of data dictionary models. We mapped three different data dictionaries (identified from dbGAP, PheKB and TCGA) onto AML archetypes by aligning dictionary variable definitions with the AML archetype elements. The near complete alignment of data dictionaries helped map them into valid AML models that captured all data dictionary model metadata. The outcome of the work would help subject matter experts harmonize data models for quality, semantic interoperability and better downstream data integration.
1. Introduction
Spreadsheets and other tabular formats have become the primary mechanism for representing much of the clinical research data that is published today. The simple format and easy-to-use interface has allowed researchers to customize and manage data with relative ease. Large projects like the Database of Genotypes and Phenotypes (dbGaP)1, 2, the Phenotype Knowledge Base (PheKB)3, 4 and The Cancer Genome Atlas (TCGA)5, 6 have published their data dictionaries in a variety of formats.. They have used either delimited values variants (CSV - comma-separated values) or Extended Markup Language (XML) schemas, where column headings or XML elements (and associated attributes) carry an implicit metadata about the study and its variables. While a simple representation makes it easier to procure, maintain and disseminate datasets and implementation artifacts in these formats, it presents a challenge to the semantic interoperability of the models that these data dictionaries define. These model assertions, even when they are standardized with external vocabulary resources, almost always require additional transformations to be effective and semantically interoperable outside their local context.
There is an emerging need for standard representation of these data dictionaries to facilitate their harmonization and for greater semantic interoperability7–11. A standard representation for a data dictionary would allow researchers to identify the minimal set of information that would be needed to align their model with other models that describe same set of semantics. This would also enhance the clarity of the individual model semantics and improve their interoperability when model and its datasets are shared. A standard template for the representation of a data dictionary could greatly reduce the transformations required to move between different environments.
The Archetype Modeling Language (AML)12 is one of formalisms approved by the HL7 Clinical Information Modeling Initiative (CIMI)13, 14 to describe clinical models. An AML model is represented using the Unified Modeling Language (UML)15. The AML specification is based on elements from HL7 Detailed Clinical Models (DCM)16, OpenEHR ADL Archetypes17, 18, the ISO 11179 Meta Data Repository (MDR)19 and the OMG Common Terminology Services 2 (CTS2)20 standards. AML archetypes employ constraint-based modeling approach, where the possible values of a starting reference model are incrementally restricted (constrained) as the target of the model is specialized.
The objective of the study is to create direct mappings between the metadata from a collection of three data dictionaries (from dbGaP, PheKB, and TCGA, respectively) with their variable definitions and the AML constraints on the CIMI Core Reference archetypes21, and demonstrate the utility of the mappings by implementing the transformation via a suite of semi-automated tools. The treatment of data dictionary variable definitions as AML constraints provides both a standardized representational form and facilitates their harmonization with other similar archetypes. Based on the mappings, we develop a platform known as D2Refine on top of an open-source OpenRefine platform22. The platform provides a suitable interface that allows users to load data dictionaries in a variety of formats, align them to a standard template, and transform them into CIMI archetypes.
2. Materials and Methods
2.1. Materials
2.1.1. Clinical Study Data Dictionaries
We used three data dictionary formats identified from dbGaP, PheKB and TCGA. The data dictionaries from dbGaP and TCGA are in XML schema format whereas the format of PheKB is Excel spreadsheet. Each of the data dictionaries contains a collection of metadata including variable definition, constraints and links to standard terminologies. Some of these data dictionaries include references to standard terminologies like NCI Thesaurus23 and Common Data Elements from Cancer Data Standard Registry and Repository (caDSR)24 already embedded in most of the models they define.
2.1.2. OMG AML Specification
The AML specification12, as an Object Management Group (OMG) standard, is composed of three UML Profiles - Reference Model Profile (RMP), Terminology Profile (TP) and Constraint Profile (CP). The Reference Model Profile defines stereotypes to identify the ground rules for constraining a target UML model. The Terminology Profile provides a set of stereotypes that all UML models to have multi-lingual names and comments, allows the assignment of multiple external identifiers (e.g. URI’s etc.) and allows UML model elements to be associated with their intended semantics through links to elements from external ontologies. The Constraint Profile defines a set of stereotypes that allows the specification of constraints (archetypes) on the target classes, properties and data types in a target UML model. An AML archetype is a set of constraints on a target class in a target reference model. An AML archetype library is a collection of archetypes that constrain one or more classes in a given UML model (the Reference Model or RM). Note that additional AML documentation includes 1) the AML specification in GitHub (including a collection of normative and non-normative formats of AML artifacts)25, the AML Archetype Examples26, the OpenEHR’s ADL and AOM Specifications27 and an AML prototype project in GitHub28.
2.1.3. The Reference Model
CIMI has specified a generic reference model, the CIMI Core Reference Model29 that underlies all archetypes published by the CIMI group. CIMI also publishes a set of “reference archetypes”, the CIMI Core Reference Archetypes. These archetypes constrain the CIMI Core Reference Model and form the basis of all other archetypes published by the CIMI group. In the clinical space, these reference archetypes provide basic structures to catalog the majority of the constraint definitions listed in data dictionaries. Figure 1 shows a selection of CIMI Core RM classes and a high level CIMI Core Reference Archetype Cluster (an online version available at *,†).
Figure 1.
The CIMI Core Reference Archetype ‘Cluster’ constrains the CIMI Core RM Class ITEM_GROUP
2.1.4. OpenRefine - The Platform
We will use an open-source platform known as OpenRefine22, which is a popular and widely used tool to manage and clean up data. OpenRefine provides a mechanism that allows users to programmatically extend its capabilities to add standard templates in a tabular form. The standard template enables users to define mandatory and optional variable definitions for a data dictionary. The OpenRefine’s built-in reconciliation feature (which is extensible) allows users to link the values with external terminology services. The set of constraints in a template can then be programmatically transformed into CIMI archetypes by extending the export/import functionality of OpenRefine. We plan to add these features without altering spreadsheet-like interface of OpenRefine intact. This greatly minimizes the learning curve for users.
2.2. Method
2.2.1. Create a Standard Template based on AML specification
The first step towards capturing the archetype requirements is the creation of a Standard Template. The Standard Template serves as a bridge between tabular metadata and their AML equivalents. The template provides a minimum set of requirements to be filled by variable definitions of a data dictionary and its metadata. The Standard template would need to be implemented once and hence it would eliminate the need for any additional transforms. For a data dictionary, what we need to do is to map it to the Standard Template and to get it transformed into an AML archetype.
The elements in the Standard Template are, directly derived from the AML specification profiles and are manually composed, as a set of three types of requirements to create a valid AML archetype and to link its terms to standard terminologies. These requirements are:
Archetype Metadata: archetype library, identification, constrained RM class, archetype specialization, organization, copyright, version and author information
Constraint Definition: constraint identification, constrained type, values, multiplicity, value set, archetype reference
Terminology Bindings: archetype term definitions, references to standardized terminology resources like code systems, concepts, value set and value-set members
2.2.2. Create mappings between data dictionary elements and standard template elements
We then establish the mappings between corresponding data dictionary elements and the standard template elements. The mappings helped us observe how the data dictionary properties align to create a valid archetype and yet keep the semantics of the data model represented by a data dictionary intact. The informative section ‘AML-UML Transformation Reference’ of OMG AML specifications, which describes core set of mappings between ADL and AML, guided the creation of these mappings. As illustrated in Figure 2, a data dictionary is defined by and composed of its study variables. This structure resembles with the CIMI Core Reference archetype Cluster (as shown in Figure 1), which could be constrained to create almost any hierarchical collection. A data dictionary maps to an archetype, which is defined by constraining the reference archetype Cluster. Each variable definition is mapped to a leaf-level constraint node (modeled by further constraining the Cluster.Element). All these archetypes are created to reside in an Archetype Library, which maps from the domain or study of the data dictionary.
Figure 2.

Mapping between Data Dictionary and CIMI Archetype
2.2.3. Implement using D2Refine platform and tooling
We previously implemented a set of extensions to OpenRefine known as D2Refine30. D2Refine includes features that can represent the Standard Template developed above. We plan to add a new extension to the D2Refine platform that allows us to serialize data dictionaries to and from AML archetypes, and a second to determine whether a given data dictionary meets the requirements. D2Refine provides the extensible reconciliation services needed to standardize, validate and transform data dictionary variable values.
2.2.4. Evaluate with a case study using a sample data dictionary
To test the utility of these mappings, we performed a case study and manually converted a sample data dictionary into AML archetypes. Figure 3 shows a customized dbGaP data dictionary with three variables31. These three variables represent the most common types of constraints we observed in a data dictionary. The example shown here contains an identifier, a value set definition and an interval combined with a coded value.
Figure 3.

A Sample dbGaP data dictionary
3. Results
3.1. The Mappings
The results of mappings between data dictionary elements and standard template elements are shown in Tables 1, 2, and 3. The second column of each table lists the target properties of the Standard Template, which were derived from various UML stereotypes in three AML Specification profiles. The prefixes RMP, CP and TP are used to show the namespace of the AML UML profiles - the Reference Model Profile, Constraint Profile and Terminology Profile respectively. The data dictionary variables and their attributes are described using either pseudo-XPATH syntax or by their column names.
Table 1.
Mapping for the Archetype metadata
| Mapping description for model metadata | Standard Template Elements (Follows OMG AML Specifications: in UML) The target AML Archetype element to which a Data Dictionary element is mapped. The target AML element is described as: AMLProfile.Stereotype.property |
![]() |
![]() |
![]() |
|---|---|---|---|---|
| dbGAP (XML Schema) | PheKB (Spreadsheet) | TCGA (XML Schema) | ||
| The model library | CP.ArchetypeLibrary.name | Study name | File name | Schema name |
| A model in the Library | CP.Archetype.name | data table@study id | TABLENAME | Schema name |
| Model definition | CP.ComplexObjectConstraint.name | data table@study id | TABLENAME | Schema name (XSD file base name) |
| Constrained RM class | CP.Constrains | Cluster is constrained | Cluster is constrained | xs:restriction |
| Constraint name | CP.ComplexObjectConstraint.name | variable/name | VARNAME | xs:element@name |
| Constrained RM class property | CP.Constrains | Cluster. Element is constrained | Cluster.Element is constrained | xs:restriction |
| Description | CP.ResourceAnnotationNodeltem | variable/description | VARDESC, COMMENT | xs:annotation/xs:documentation |
Table 2.
Mappings for data types and value constraints
| Mapping description for constrained values and their data types | Standard Template Elements (Follows OMG AML Specifications; in UML) The target AML Archetype element to which a Data Dictionary element is mapped. The target AML element is described as: AMLProfile.Stereotype.property |
![]() |
![]() |
![]() |
|---|---|---|---|---|
| dbGAP (XML Schema) | PheKB (Spreadsheet) | TCGA (XML Schema) | ||
| Data types | RMP.MappedDataType data type or CIMI Reference Model data value type DATA_VALUE | variable/type | TYPE | XML Primitive types, xs:element@ref |
| Values |
Redefines or subsets the value of the constrained property (e.g. redefines Cluster.Element.value) |
variable/value | FORMATTED VALUE | xs:element@value |
| Encoded values | Constrains CIMI Reference Model data value type CODED_TEXT.code | variable/value@code | RAW VALUE | xs:enumeration |
| Archetype Reference | CP.ArchetypeRoot | N/A | N/A | xs:element@ref |
| Interval Values | UML:Interval constraint | variable/logical-min, variable/logical-max | MIN, MAX | xs:element@minOccurs,xs:element@maxOccurs |
| Temporal Interval Values | UML:Duration constraint | |||
| Multiplicity/Occurrences | UML:MultiplicityElement | REQUIRED, REPEATED MEASURE | xs:attribute@use |
Table 3.
Mappings for terminology bindings and value set references
| Mapping description for terminology bindings | Standard Template Elements (Follows OMG AML Specifications; in UML) The target AML Archetype element to which a Data Dictionary element is mapped. The target AML element is described as: AMLProfile.Stereotype.property |
![]() |
![]() |
![]() |
|---|---|---|---|---|
| dbGAP (XML Schema) | PheKB (Spreadsheet) | TCGA (XML Schema) | ||
| Term Identification | TP.ArcehtypeTerm.id | variable@code | RAW VALUE | xs:enumeration@value |
| Term Text | TP.IdEntry.text | variable/value | FORMATTED VALUE | xs:element@name |
| Value Set Name | TP.IdEntry.text | variable/name | Worksheet name | xs:element@name |
| Value Set Member Name | TP.ArcehtypeTerm.value_set_members | variable/name | Separate value-set Worksheet | xs:enumeration@value |
| Value Set Member Code | TP.ArcehtypeTerm.id | variable@code | RAW VALUE | xs:enumeration@value |
| Code System or Value Set Reference |
TP.CodeSystemReference, TP.CodeSystemVersionReference, TP.ValueSetReference, TP.ValueSetDefinitionReference |
NCI Thesaurus codes found, but other code systems could be used | DOCFILE + SOURCE + SOURCEID | caDSR |
| Concept or Value Set member Reference |
TP.ConceptReference TP.PermissibleValue |
variable@code | RAW VALUE | attribute ‘cde’ |
Table 1 lists the first set of Standard Template elements for Archetype Metadata: Archetype library – a collection to which an archetype belongs, Archetype’s defining top-level constraint constraining RM Class or a parent archetype. The sub-constraints constraining RM Class properties follow the same pattern.
Table 2 shows the mappings satisfying requirements for data value types and value constraints like intervals and permissible values of a value set. In additional to the primitive types, the CIMI Core Reference Model defines all kinds of (concrete) Data Value Types like COUNT, QUANTITY, CODED_TEXT and others. All these data types are descendents of an abstract data value type DATA_VALUE. The target archetype constraint constrains an appropriate DATA_VALUE type, which depends on type of the values associated with a variable in the data dictionary. The ‘ArchetypeRoof’ constraint of an archetype is to create a reference to another archetype (composition).
Table 3 lists the mappings for an archetype’s terminology bindings for the terms and codes identified from data dictionary variables. These mappings translate into references to internal and external vocabulary resources (e.g. NCI Thesaurus). The stereotypes of the Terminology Profile combine the features of ISO 11179-3 model and Common Terminology Services 2 (CTS2) specifications. The ISO 11179-3 model guides us for identification, designation, definition and value/meaning binding aspects. The CTS2 specification provides model for Concept, Code System, Code System Version, Value Set and Value Set Definition references.
These mappings were reviewed and verified by a panel of subject matter experts and the utility of the mappings was demonstrated by a successful transformation of a testing data dictionary into AML archetypes. Figure 4 shows a sample dbGaP data dictionary implemented as an AML archetype. As per mappings listed in the tables earlier, we used the AML stereotypes of ‘ArcheytpeLibrary’, ‘Arcehtype’, ‘ComplexObjetConstrainf, ‘EnumeratedValueDomain for modeling Archetype Library, Archetype, constraints and terms (and terminology bindings) respectively. The archetype named ‘pht003255’ constrains the Cluster reference archetype. It is composed of three sub-constraints: ‘SUBJID’, ‘SEX’ and ‘AGE_FIRST – that are directly mapped from the data dictionary variable names. The target type of these constraints is mapped to the AML Primitive and Data Value Types. Figure 4 shows the size and complexity of a resulting UML model for three simple constraint definitions (an online version available at‡).
Figure 4.
The sample dbGAP data dictionary implemented as an AML archetype
Please note that data dictionary variable identifiers are replaced with identifiers that follow the AOM identification scheme for clarity. The AOM specification describes identifier prefixes as id (normal identifier), ac (value-set identifier) and at (value-set member). The numerical number after these prefixes helps identify specialization level quickly.
3.2. Implementation status of the D2Refine transformation tools
The subsequent tasks, collectively known as D2Refine workbench30, are already in-progress, to:
Prepare OpenRefine platform for configuration and validation of the Standard Template elements
Extend the import/export plugins of OpenRefine to load, transform and persist data dictionaries as AML archetypes
We have already implemented a prototype transform that transforms dbGaP data dictionary to OpenEHR’s ADL format. Similar extension is required to implement the transformation to AML archetypes
Configure and extend OpenRefine’s reconciliation services with internal vocabulary terms and external terminological resources like NCIT, SNOMED CT, LOINC, caDSR and FHIR Profiles and their implementation
We have an Eclipse Modeling Framework (EMF)32 implementation of the CIMI Core Reference Model, which we have already been using.
Recently, we have successfully augmented OpenRefine’s reconciliation capabilities using any CTS2 compliant repository, which enables users to create terminology bindings of data dictionaries. We believe that we are on track to implement these features of D2Refine Workbench (Figure 5) for real-world use cases in near future. With the workbench support, we can practically and effectively map all metadata of any data dictionary with the Standard Template elements, CIMI Core Reference Model and the CIMI Core Reference archetypes.
Figure 5.

The D2Refine Workbench
4. Discussion
Data dictionaries provide simplest form to capture and list metadata entries in a tabular format. The tabular format is usually derived from a spreadsheet, a structured XML document or a set of delimited values arranged as row and columns. This simple form, though providing fastest way to organize and understand the datasets, can be really diverse and rarely portable outside of their environment. The encouraging aspect of data dictionary is that they are all very similar in the way of capturing the details of the variables. A data dictionary lists variable definitions with their names, data types, value ranges and binding to local and external resources, which is analogous to how an archetype is composed. A data dictionary does not have a reference model, which is needed by an archetype to define constraints. We found that the reference model component of an archetype definition can easily be substituted either by organizing structures of data dictionaries into a reference model classes or utilizing an existing reference model that is sufficient. In addition, the data dictionary formats used in this study are identified from three large projects: dbGap, TCGA and PheKB. dbGap and TCGA are two NIH pilot data commons; and PheKB4 is a catalog of electronic phenotype algorithms and associated data dictionaries, and largely used in the eMERGE Research Network. We believe that the selected data dictionary formats represent well the current practice in the clinical research communities.
The simplified view of CIMI Reference Archetype21 - Cluster, shows a recursive structure definition by constraining CIMI Reference Model (RM)29 classes ITEM_GROUP and ELEMENT for grouping or representing leaf level entities respectively. Since data dictionaries do not have a reference model specified, we use CIMI Core Reference Archetypes and constrain them to model data dictionary constraints. In case of the TCGA data dictionary where a number of commonly used structure definitions were introduced, these definitions were augmented to the collection of CIMI Core RM types and the reference model extended into a TCGA Reference Model. The creation of TCGA RM made it possible to correctly map the TCGA data dictionary constraints to the standard template requirements.
The implementation of standard template is not only able to preserve the existing tabular, spreadsheet-like interface to create and manage data dictionaries, but also facilitates additional transforms to better utilize them in different environments. The standard template directly maps from AML profiles and hence guides the modelers to make sure that their models have the necessary model elements and could be persisted as archetypes. The ‘archetype modeling’ is the ‘constraint-based modeling’ where constraints are about the reference model classes. The Archetypes stand separate from the reference model that does not change in any way, which introduce flexibility and improve interoperability.
Besides AML, the Archetype Definition Language (ADL)17 is the other supported formalisms approved by the HL7 CIMI for describing clinical models. A model in ADL format is described in Object Data Instance Notation (ODIN) text whereas an AML Model is represented in UML. We are also actively developing tools that enable the transformation of data dictionaries into the ADL representation. The transformed data dictionaries into the ADL format can leverage the tools and environment developed for ADL models and that can help harmonize them with thousands of existing ADL archetypes. On the other hand, the transformation of data dictionaries into CIMI archetype’s AML format (a non-proprietary UML model) is an important gateway to the Model-Driven Architecture (MDA)33. The MDA workflow provides a way to expedite development of flexible and robust dataset validation tools and user applications. The AML specifications are guided by ADL Object Model (AOM) and include transformation mappings to seamlessly move between the ADL and AML. These mappings between ADL and AML are being implemented by the community and will be available in near future.
The standard template in D2Refine preserves the spreadsheet-like tabular interface that allows users to review data dictionaries; at the same time it allows model metadata to bind to standard terminologies and metadata repositories with its extensible reconciliation services. D2Refine also provides built-in framework for implementing bidirectional transforms to store and manage data dictionaries using AML formalism.
The task of creating CIMI AML archetypes requires implementing the AML Profiles and using UML implementation to create AML’s UML artifacts. The open-source EMF32 and Eclipse UML2 Libraries are essential to this task. A library of convenient programming interfaces encapsulating the EMF and UML2 implementations is being developed. Ability to access these interfaces from OpenRefine extensions should be sufficient to realize and persist AML archetypes.
The manual process of creating the AML archetypes from data dictionaries is impractical, tedious and error-prone. As described earlier, we will need various items to programmatically transform data dictionaries. The open-source EMF implementation of the AML Profiles is crucial to be able to create AML objects. We have worked on investigating the Model Driven Health Tools (MDHT)34, an Eclipse open-source project, to generate AML profiles implementation (using OMG AML specifications) as a library. We plan to use Eclipse’s UML2 libraries to create UML 2.5 objects.
5. Conclusions
In this study, we successfully created reliable mappings between three types of data dictionaries and the standard template elements informed by the AML specification. We have demonstrated that the mappings are very helpful in enabling the representation of a data dictionary in a CIMI archetype. We are actively implementing the mappings in a D2Refine platform30 to enable the transformation of data dictionaries into CIMI archetypes. The outcome of our work will enable the standard representation of heterogeneous clinical study data dictionaries, thereby facilitating effective metadata harmonization and downstream data integration, ultimately advancing clinical research studies.
Acknowledgement
This work was supported in part by funding from R01 GM105688, R01 GM103859 and a NCI U01 Project - caCDE-QA (U01 CA180940).
Footnotes
References
- 1.The database of Genotypes and Phenotypes (dbGaP) 2016. [March 10, 2016]; Available from: http://www.ncbi.nlm.nih.gov/gap.
- 2.Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY, Ziyabari L, et al. NCBI’s Database of Genotypes and Phenotypes: dbGaP. Nucleic acids research. 2014;42(Database issue):D975–9. doi: 10.1093/nar/gkt1211. Epub 2013/12/04. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Phe KB. 2016. [March 10, 2016]; Available from: https://phekb.org/
- 4.Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. Journal of the American Medical Informatics Association: JAMIA. 2016 doi: 10.1093/jamia/ocv202. Epub 2016/03/31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.TCGA Data Portal. 2016. [March 10, 2016]; Available from: https://tcga-data.nci.nih.gov/tcga/
- 6.TCGA Data Dictionary for Clinical Data. 2016. [March 10, 2016]; Available from: https://tcga-data.nci.nih.gov/docs/dictionary/
- 7.Cho I, Park HA. Evaluation of the expressiveness of an ICNP-based nursing data dictionary in a computerized nursing record system. Journal of the American Medical Informatics Association: JAMIA. 2006;13(4):456–64. doi: 10.1197/jamia.M1982. Epub 2006/04/20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hicken VN, Thornton SN, Rocha RA. Integration challenges of clinical information systems developed without a shared data dictionary. Studies in health technology and informatics. 2004;107(Pt 2):1053–7. Epub 2004/09/14. [PubMed] [Google Scholar]
- 9.Duda SN, Cushman C, Masys DR. An XML model of an enhanced data dictionary to facilitate the exchange of pre-existing clinical research data in international studies. Studies in health technology and informatics. 2007;129(Pt 1):449–53. Epub 2007/10/04. [PMC free article] [PubMed] [Google Scholar]
- 10.Obeid JS, McGraw CA, Minor BL, Conde JG, Pawluk R, Lin M, et al. Procurement of shared data instruments for Research Electronic Data Capture (REDCap) Journal of biomedical informatics. 2013;46(2):259–65. doi: 10.1016/j.jbi.2012.10.006. Epub 2012/11/15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cunningham SG, Carinci F, Brillante M, Leese GP, McAlpine RR, Azzopardi J, et al. Core Standards of the EUBIROD Project. Defining a European Diabetes Data Dictionary for Clinical Audit and Healthcare Delivery. Methods of information in medicine. 2015;55(2) doi: 10.3414/ME15-01-0016. Epub 2015/12/17. [DOI] [PubMed] [Google Scholar]
- 12.Archeypte Modeling Language (AML) 2016 [January 22, 2016]; Available from: http://www.omg.org/spec/AML/ [Google Scholar]
- 13.Clinical Information Modeling Initiative (CIMI) 2016. [March 10, 2016]; Available from: http://www.opencimi.org.
- 14.Jiang G, Evans J, Oniki TA, Coyle JF, Bain L, Huff SM, et al. Harmonization of detailed clinical models with clinical study data standards. Methods of information in medicine. 2015;54(1):65–74. doi: 10.3414/ME13-02-0019. Epub 2014/11/27. [DOI] [PubMed] [Google Scholar]
- 15.Unified Modeling Language (UML) 2016. [March 10, 2016]; Available from: http://www.uml.org/
- 16.HL7 Detailed Clinical Models. 2016 [January 18, 2016]; Available from: http://wiki.hl7.org/index.php?title=Detailed_Clinical_Models. [Google Scholar]
- 17.openEHR. 2016. [January 23, 2016]; Available from: http://www.openehr.org/
- 18.Beale T. Archetypes and the EHR. Studies in health technology and informatics. 2003;96:238–44. Epub 2004/04/06. [PubMed] [Google Scholar]
- 19.ISO/IEC 11179 Metadata Standard. 2016. [March 10, 2016]; Available from: http://metadata-standards.org/11179/
- 20.The OMG Common Terminology Services 2 Standard. 2016. [March 10, 2016]; Available from: http://www.omg.org/spec/CTS2/1.1/
- 21.CIMI Core Reference Archetypes. 2016 [March 10, 2016]; Available from: https://github.com/opencimi/archetypes/tree/master/miniCIMI. [Google Scholar]
- 22.OpenRefine. 2016. [March 10, 2016]; Available from: http://openrefine.org/
- 23.NCI Thesaurus. 2016. [January 23, 2016]; Available from: https://ncit.nci.nih.gov/ncitbrowser/
- 24.NCI Cancer Data Standards Registry and Repository (caDSR) 2016. [January 20, 2016]; Available from: https://cbiit.nci.nih.gov/ncip/biomedical-informatics-resources/interoperability-and-semantics/metadata-and-models.
- 25.AML Specifications in GitHub. 2016. [July 5, 2016]; Available from: https://github.com/opencimi/AML/tree/master/Specification.
- 26.AML Archetype Examples. 2016 [July 5, 2016]; Available from: https://github.com/opencimi/AML/tree/master/adl-examples-for-aml. [Google Scholar]
- 27.OpenEHR’s ADL and AOM Specifications. 2016. [July 5, 2016]; Available from: http://www.openehr.org/releases/AM/latest/docs/
- 28.AML Prototype Project. 2016. [July 5, 2016]; Available from: https://github.com/caCDE-QA/D2Refine/tree/master/PrototypeProject.
- 29.CIMI Core Reference Model. 2016. [March 10, 2016]; Available from: https://github.com/opencimi/rm/tree/master/model/Release-3.0.5/UML/AML_RM.
- 30.caCDE-QA D2Refine GitHub Site. 2016. [January 23, 2016]; Available from: https://github.com/caCDE-QA/D2Refine.
- 31.dbGaP Example Data Dictionary. 2016. [March 10, 2016]; Available from: ftp://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000360/phs000360.v2.p1/pheno_variable_summaries/phs000360.v2.pht 003255.v2. MergedSet_Subject_Phenotypes.data_dict.xml.
- 32.Eclipse Modeling Framework. 2016. [March 10, 2016]; Available from: https://eclipse.org/modeling/emf/
- 33.Model Driven Architecture. 2016. [March 10, 2016]; Available from: http://www.omg.org/mda/
- 34.Open Health Tools Model-Driven Health Tools (MDHT) 2016. [January 23, 2016]; Available from: https://http://www.projects.openhealthtools.org/sf/projects/mdht/





