NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

. 2012 Feb 22;61(4):675–689. doi: 10.1093/sysbio/sys025

ASN.1:	Abstract Syntax Notation One (ASN.1) is an object representation language well suited to highly structured data (see McEntire et al. 2000). ASN.1 is used internally at NCBI.
CDAO:	the Comparative Data Analysis Ontology (Prosdocimi et al. 2009; http://www.evolutionaryontology.org/cdao), an ontology developed in Web Ontology Language (OWL) to formalize the concepts and relations used in evolutionary comparative analysis, such as phylogenetic trees, OTUs, and character-state data.
DNS:	Domain Name System, a hierarchical distributed naming system for resources connected to the Internet. DNS is used to translate human-readable names (e.g., nexml.org) into globally unique, numerical addresses used by networking equipment. Such human-readable domain names are often part of GUID schemes such as LSIDs and HTTP URIs.
EvoInfo:	The Evolutionary Informatics Working Group supported by NESCent from 2006 to 2009 (http://evoinfo.nescent.org) spawned a variety of projects, including NeXML, CDAO, and PhyloWS.
GraphML:	a file format for graphs (Brandes et al. 2002). It consists of a language core describing the structural properties of graphs and an extension mechanism to add application-specific data.
GUID:	Globally Unique Identifier, an identifier, that is, a string of text, intended to identify one and only one object (e.g., a concept, a species, a publication). Different schemes have been devised for this, among which are LSIDs, DOIs, and HTTP URIs. A characteristic shared by a number of GUID schemes is that they are frequently a combination of a (sometimes DNS-based) “naming authority” part and a local identifier that is managed by the naming authority.
HTTP:	HyperText Transfer Protocol, the data transfer protocol used on the World Wide Web. HTTP can be used as a technology upon which GUID schemes can be built because it, in turn, builds on a scheme for uniquely identifying addresses (DNS) and because it defines a mechanism for resolving those addresses and returning con- tent, such that information about an object that is identified using an HTTP-based GUID can be looked up.
JSON:	JavaScript Object Notation (http://www.json.org), a lightweight open standard for representing structured data originally based on the syntax for data structures of the JavaScript programming language. XML, which is more verbose, can be translated to JSON, allowing for more concise transmissions of NeXML data in situations where bandwidth is at a premium, for example, inside a web browser window.
LSID:	Life Science Identifier, a means to identify a piece of biological data using an URN scheme (see URI, below) comprised of an authority, a namespace, an object identifier, and an optional version number. HTTP URI serves the same function and is more widely used and supported.
MIAPA:	Minimum Information for a Phylogenetic Analysis, a draft proposal for a MIBBI (Minimum Information for Biological and Biomedical Investigations) standard, specifying the key information for authors to include in a phylogenetic record in order to facilitate the reuse of the phylogenetic data and validation of phylogenetic results.
NCBI:	the National Center for Biotechnology Information (http://ncbi.nlm.nih.gov), part of the United States National Library of Medicine, a branch of the National Institutes of Health. NCBI provides access to biomedical and genomic information. In particular, its databases of DNA sequence data (GenBank, http://www.ncbi.nlm.nih.gov/genbank/) and its taxonomy (http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html) are relevant to the comparative biology community. Due to NCBI's sheer size and longevity, many of the technology choices it made (e.g., for its sequence and taxon identifiers, its sequence file formats) have become de facto standards.
OBO:	Open Biological and Biomedical Ontologies (http://www.obofoundry.org/), a consortium of developers of science-based ontologies with the goal of creating a suite of orthogonal, interoperable reference ontologies in the biomedical domain.
OWL:	Web Ontology Language, a knowledge representation metalanguage for authoring the formal semantics of ontologies commonly serialized as RDF/XML.
RDF:	Resource Description Framework (Beckett 2004), consisting of a set of W3C specifications for conceptually describing objects (e.g., by their attributes) and the relationship among the objects (e.g., by the changes of their attributes in response to changes in the attributes of other objects). One of the applications of RDF is in the development of database schemas.
RDFa:	Resource Description Framework in attributes, which extends XHTML and other XML formats to allow data described in RDF to be rendered into well-formed XML documents. RDFa therefore bridges RDF to the XML-based web and database world.
RDFS:	RDF Schema, a semantic extension of RDF that defines a set of classes and properties using the RDF language. These classes and properties provide basic elements for the description of RDF vocabularies or ontologies.
RDF/XML:	RDF serialized as XML.
SKOS:	Simple Knowledge Organization System (http://www.w3.org/2004/02/skos/core#), which is a family of formal languages designed for representation of knowledge in the form of trees and networks in specific ontologies (This representation is often achieved through RDF and RDF-schema). SKOS, together with the publication of the SKOS-organized data as web documents and the computational infrastructure for automating the processing of such web documents, makes up the semantic web.
uBio:	the Universal Biological Indexer and Organizer, http://www.ubio.org (see Leary et al. 2007). uBio records canonical names, vernacular names, synonyms and homonyms for biological taxa in its NameBank database, and anchors these recorded names on a number of widely used taxonomies, including the NCBI taxonomy. uBio also provides a number of web services, including ones that query its NameBank for occurrences of provided names (the findIT service).
URI:	Uniform Resource Identifier, which can take 2 forms, the uniform resource name (URN) and uniform resource locator (URL). A digital object identifier (DOI) is an example of URN, for example, a journal paper can have a URN as doi:10.1093/molbev/msr005 and a URL as http://mbe.oxfordjournals.org/content/early/2011/01/07/molbev.msr005. URN and URL are analogous to a person's name and his street address where he can be found.
W3C:	the World Wide Web Consortium, a standards body that published “recommendations” that formally describe technologies used on the world wide web, including, for our purposes, OWL, RDF, RDFa, RDFS, RDF/XML, SKOS, XHTML, XML, XPath, XQuery, XSD, and XSLT.
XHTML:	Extensible HyperText Markup Language, an XML-based, stricter version of HTML, the markup language in which pages on the World Wide Web are authored.
XML:	Extensible Markup Language (XML), a metalanguage consisting of a set of rules for encoding data in machine-readable form in user-defined, customized domain languages, of which NeXML is an example.
XPath:	the XML Path Language, which is a query language for selecting nodes from an XML document which is represented by a hierarchical multi-furcating tree. The query language facilitates the tree traversal by allowing the selection of specific nodes in the tree through a variety of criteria. It is used in XML parsers and other software programs that process XML documents.
XQuery:	a query and functional programming language that is intended to achieve the ultimate objective of seamlessly integrating the web and the database, that is, when both are based on XML and therefore can be accessed and processed in the same way. XPath is a component of XQuery.
XSD:	XML Schema Definition, a language for describing the syntax and grammar of an XML-based domain language such as NeXML (see Biron and Malhotra 2004; Fallside and Walmsley 2004; Thompson et al. 2004; for the formal W3C recommendations).
XSLT:	Extensible Stylesheet Language Transformations, which can take an XML document and convert it either into another XML document or a non-XML document containing either the same or a subset of the information in the original XML document. It does this by applying transformation templates on XPath expressions that select patterns in a source XML document. For example, a mitochondrial genomic sequence stored in the XML format in GenBank can be rendered by XSLT to other sequence format (e.g., FASTA or HTML for web display) or to another XML file containing a subset of information (e.g., containing only coding sequences in the genome).