Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Mar 31.
Published in final edited form as: Neuroinformatics. 2008 Oct 29;6(3):161–174. doi: 10.1007/s12021-008-9029-7

Terminology for Neuroscience Data Discovery: Multi-tree Syntax and Investigator-Derived Semantics

Daniel Gardner 1,, David H Goldberg 2, Bernice Grafstein 3, Adrian Robert 4, Esther P Gardner 5
PMCID: PMC2663521  NIHMSID: NIHMS94458  PMID: 18958630

Abstract

The Neuroscience Information Framework (NIF), developed for the NIH Blueprint for Neuroscience Research and available at http://nif.nih.gov and http://neurogateway.org, is built upon a set of coordinated terminology components enabling data and web-resource description and selection. Core NIF terminologies use a straightforward syntax designed for ease of use and for navigation by familiar web interfaces, and readily exportable to aid development of relational-model databases for neuroscience data sharing. Datasets, data analysis tools, web resources, and other entities are characterized by multiple descriptors, each addressing core concepts, including data type, acquisition technique, neuroanatomy, and cell class. Terms for each concept are organized in a tree structure, providing is-a and has-a relations. Broad general terms near each root span the category or concept and spawn more detailed entries for specificity. Related but distinct concepts (e.g., brain area and depth) are specified by separate trees, for easier navigation than would be required by graph representation. Semantics enabling NIF data discovery were selected at one or more workshops by investigators expert in particular systems (vision, olfaction, behavioral neuroscience, neurodevelopment), brain areas (cerebellum, thalamus, hippocampus), preparations (molluscs, fly), diseases (neurodegenerative disease), or techniques (microscopy, computation and modeling, neurogenetics). Workshop-derived integrated term lists are available Open Source at http://brainml.org; a complete list of participants is at http://brainml.org/workshops.

Keywords: Neurodatabases, Data sharing, Terminologies, Portals

The Evolution of Scientific Information and the Neuroscience Information Framework

We introduce the core enabling terminologies for the Neuroscience Information Framework (NIF), and view the NIF itself, in the context of access to scientific information. At the dawn of science, information was disseminated via individual letters to a small number of other researchers. Printing technology enabled letters to be collected, assembled in journals, and distributed more widely. Although today an increasingly dominant mode of publication is paperless, with text and illustrations delivered via Net protocols, these are largely still as PDF or other page images. Access to this textual material, accompanied by graphical or photographic illustrations, remains conventional, with textual Google or PubMed searches that match exact tokens in publications complementing text-based indexes.

Scientific information is evolving beyond this literature page model. New media include video and 3-D via the Web, and increasingly databases deliver actual datasets, supplementing figures. Beyond neurodatabases, neuroscience web resources include knowledge bases, atlases of structure, expression, and function, genetic/genomic and material resources, and tool and modeling sites for processing, analysis, or simulation of brain data. Such sites span multiple biological scales, techniques, and data models and are often targeted towards communities of neuroscientists that use specific conventions and terminologies (Gardner et al. 2008; Koslow and Hirsch 2004).

With support from the NIH Neuroscience Blueprint Institutes and Centers, we have developed a new initiative for integrating access to and use of web resources. This Neuroscience Information Framework, accessible via http://nif.nih.gov, http://neurogateway.org, and other sites to be announced (Gardner et al. 2008) provides access to data, tools, and materials (as well as text) across scales, methods, and preparations.

Enabling Terminologies for the Neuroscience Information Framework

Framework Core Terminology Is Designed to Span—and Unify—Scales, Domains, and Uses

The NIF consortium wished to avoid a ‘Tower of Babel’ problem in which development was delayed by the many different ways neuroscientists use to describe the same thing. Humans readily map terms to the concepts they describe, although scope and meaning are often imprecise or ambiguous, but automated methods need the precision provided by terminologies, ontologies, or context-based methods. Moreover, the breadth of neuroscience is such that no single view of neuroscience, and therefore no individual terminology, is sufficient. To serve all neuroscience, we set as a design goal that the Neuroscience Information Framework respect and recognize query semantics serving multiple views of the neuroscience ecosystem (Gardner et al. 2008).

Controlled-Vocabulary Metadata Aid Access to Data or Findings

A goal was to develop terminology to serve the proliferation of web-accessible data and publications, enabling users to specify in a consistent manner important features of these data. Controlled vocabularies (CV) available for both data description by submitters and queries by those searching for relevant data avoid lexical mismatch and false negatives. For both submitters and searchers, it is of use to have a comprehensive set of terms that can be selected from, and to have such terms (semantics) arranged in an informative, useful, and intuitive structure (syntax). It is also a design goal that the semantics serve the needs of multiple communities within neuroscience. To be accurate, the terms must be those used by the neuroscience community or communities generating or recording such data. To be general, they should also be understood by investigators who work with different but related systems, preparations, or techniques, and relatable to broader areas of neuroscience (Gardner et al. 2001a, b). One early such effort, which inspired our work, was the CV keywords developed for the Society for Neuroscience (SfN) by B. Grafstein to aid classification and discovery of abstracts at the Society’s Annual Meeting.

The SfN has been an enabling partner throughout development of NIFv1, the initial version of the NIF. NIFv1 terminology development was aided by the Terminology/Ontology Subcommittee of the Society for Neuroscience’s Neuroinformatics Committee; the Subcommittee included G. Ascoli, J.G. Bjaalie, D. Gardner (Chair), G. Jacobs, and M.E. Martone. The initial charge to the subcommittee was to identify several areas spanning preparations and techniques, to convene experts to establish consensus for terms and for expansion, and to use the results as a template to expand the terminology to more areas of neuroscience. Projected uses of these prototerminology efforts were to enhance search terms for the SfN’s Neuroscience Database Gateway (predecessor to and now a component of the NIF), and to enhance keywords for the Society’s journal J. Neurosci. A longer-term goal, of moving towards an interoperable terminology/ontology for neuroscience, was acknowledged from the start. The SfN supported early workshops in this integrated terminology effort.

NIF terminology development builds on and goes beyond this core vocabulary in the NIF Standardized (NIFSTD) semantic framework, which implements e.g. lexical variants, described in this volume by Bug et al. (2008).

NIFv1 Syntax I: Arranging Terms in Hierarchies Enables Both Broad and Specific Queries and Aids Database Development

Framework core terminologies are primarily a data description language for neuroscience, designed to specify and/or select particular data or findings. Based on this goal, we have selected a straightforward syntax designed for ease of use and for navigation by familiar web interfaces. Datasets, web resources, neuroinformatic software tools, or other entities are characterized by multiple descriptors, each addressing core concepts (e.g., data type, acquisition technique, cell type, and anatomy). Terms, like the keywords that accompany papers or abstracts, are organized in categories, each of which specifies a concept and includes a range of values. These include region or cell class of interest, neurobiological process, relevant disease, the type of data, or the technique by which the data were acquired.

Within a focused domain of neuroscience, it is important to make distinctions between similar locations, cell types, and data records. However, from outside each specialized domain, the distinction between e.g. the cortical areas AITd and AITv may be less relevant than specifying more general terms, such as AIT, or visual/multisensory, or even temporal cortex. For this reason, we arrange the terms describing each neuroscience concept in a tree or hierarchy. The tree structure allows selection of terms at the appropriate level of specificity for both description and search, with broad general terms near each root spawning more detailed entries. Each tree has at its root a set of general terms that broadly span the concept or description; more specific terms derive or branch from these.

Such trees encapsulate is-a and has-a relationships; neuroanatomical representations are largely has-a whereas techniques and data types are primarily is-a. Hierarchies also allow expansion and evolution without rendering prior entries obsolete, provided—as we intend—that the set of top-level terms for each slot span the full range of choices, and new terms are added under former leaf elements.

Recognizing the difficulty of attempting to fit terms relating distinct concepts into a single tree, we specify multiple trees, one for each concept or category. For example, one such tree includes brain areas, organized along the neuraxis. Additional trees specify e.g. depth or layer as a part of a location in the brain.

The use of multiple trees rather than a graph representation provides easier navigation for users. The simplicity of tree structures was selected for an additional purpose, to aid adoption of our neuroscientist-generated terms as seed metadata by other projects designing and developing new Web databases for additional neuroscience datasets, preparations, or techniques.

Gardner et al. (2005) noted that the use of controlled vocabulary and the context provide by the HAV representation enhance the utility and interoperability of metadata, substituting for the natural-language textual context missing from simple CV term lists. As each term is associated with a specific tree that encapsulates related concepts or entities, a text token such as ‘AIP’ can be both a brain area and a protein, and the word ‘grasp’ can be used both as a gene product and a motor action without confusion. Our work acknowledges and benefits from multiple similar organized CV efforts in both related and more general areas of biomedical science (Ashburner et al. 2000; Bota and Arbib 2004; Cimino 1998, 2000; Friedman et al. 1999; Goddard et al. 2001; Greer et al. 2002; Lindberg et al. 1993).

NIFv1 Syntax II: Detectors and Selectors Specify Web Resources and Contents

Framework terminology efforts are designed towards two important classes of descriptors. One set characterizes the focus of Web-accessible neuroscience resources. The other provides a data-description language enabling searches of individual resources (or a span of resources) for datasets, findings, techniques, tools, or materials of interest.

As a result of these variations in usage, we have found it useful to distinguish between detectors: general terms that specify the domain and contents of a database or other resource (tool repository, analytic engine, etc.) and selectors: query terms that allow specifying desired datasets. We recognize that there are additional, perhaps resource-specific, sets of metadata descriptors, less useful for search. These can include ‘analytical’ or ‘technical’ metadata such as filter settings or classifiers of local significance or useful for audit trails, such as experimenter, date, or local dataset index.

Broad Detector Terms Aid Description and NIF Integration of Disparate Web Resources

The Framework is being designed to offer access to a broad spectrum of Web-accessible resources. Fundamental to the orderly and efficient parsing of queries are terminologies describing such Web resources across multiple dimensions of knowledge or classification. To aid description and characterization of such resources, and to facilitate precise controlled-vocabulary queries, the project derived a list of detectors as neuroscience-aware descriptors of content and focus for the hundreds of resources in the proto-Framework at neurogateway.org. This process distilled a controlled vocabulary for inventoried web resource content from free-text descriptions that were provided by members of the Framework team and colleagues, and subsequently arranged in trees that describe each of several characteristic axes. These terms specify one or more of:

  • Resource description,

  • Neurobiological focus or disease and functional context,

  • Brain structure,

  • Organism,

  • Data type, or

  • Technique.

Figure 1 shows how this detector terminology, and the detector query screen, was utilized for resource characterization on the proto-Framework site at http://neurogateway.org. The full NIFv1 detector vocabulary may be accessed at: http://brainml.org/viewVocabulary.do?versionID=782

Fig. 1.

Fig. 1

The proto-Framework catalog at http://neurogateway.org includes a broad set of detector controlled vocabulary terms that specify resources’ scope and focus, here shown in an early version exposing segments of each of eight controlled-vocabulary detector trees

We list below a sample of this detector terminology: the resource type itself. This characterizes resources by what they provide: databases deliver data, portals deliver links, atlases deliver anatomically- or spatially-organized data, knowledge bases deliver derived, generalized or canonical descriptions, and organization-supported portals deliver neuroscience-related information grouped by subject, disease, company, or institution:

data resource (neuroscience data or findings)
     database (datasets)
     atlas (spatially-organized data)
     knowledge base (findings/knowledge derived from data)
          clinical knowledge base (diagnosis/treatment)
bibliographic resource (library/publisher or literature access)
software resource (software for acquisition, analysis, display or modeling)
     data acquisition software
     data processing/analysis/archiving software
          software for time-series analysis (nonimage)
          software for spatial/image analysis
          software for sequence analysis
          software for pathway analysis
     visualization software
          3D/4D visualization
     modeling/simulation software
research supplies (access to materials)
     instrumentation
     organism
     cell line/tissue
     reagent/chemical
portal (access to people, places, or sites)
     lab or department
     organization or institution
     wide-area portal (links to external neuroscience web sites)

Figure 2 shows a sample search for Neurodatabase.org resources relevant to a specific disease type.

Fig. 2.

Fig. 2

NIF Detector Terms Search the Neuroscience Web. Neurogateway.org, a NIF prototype resource provides access to hundreds of neuroscience Web resources. From possible detector search terms for data type, technique, organism, and others, the example shows search for a specific disease type using selected NIF terminology. The same underlying terminologies seen in Fig. 1 are here shown in an alternate drop-down menu format, emphasizing that the content is adaptable to multiple presentation schemas

Selector Terms Allow General or Specific Searches for Relevant Datasets or Other Resource Contents

A major Framework role is access to data and information provided by the increasing number of Web databases, tool sites, and others. In addition to the detector terminology above, useful for characterizing resources, a much larger set of selectors, again arranged in multiple hierarchies, are needed to specify and distinguish among individual datasets, tools, and findings. In a major section below, we detail the semantic complexity of these selectors and give examples of community-consensus terms derived from a series of expert terminology workshops.

Even with such broad development of specific selector terms, we emphasize that there remains a need for detectors that selector terms can not themselves serve. A major reason is that broad focus of individual resources is often implicit, and not specified in selector terms. For instance, all or most of the data in the Framework-accessible fMRIDC Web resource (http://fmridc.org; Van Horn and Gazzaniga 2005) is in fact fMRI data, so this is unlikely to appear as a selector term used to distinguish one dataset from another. This reinforces the need for a set of detector terms that are not explicit selector (search) terms, but characterize the specialization, technique, disease, or area of concentration.

NIFv1 Semantics: Neuroscientist-Derived Term Sets

Core NIF Terminologies Were Derived by the Neuroscience Community at a Series of Expert Workshops

To aid precise specification and adoption of selector terms, and to aid future neuroinformatic projects in developing compatible data description schemes, the project has used as its major methodology a series of neuroscience terminology workshops. At each by-invitation workshop, experts in a selected domain of neuroscience were brought together for plenary, intensive exchanges toward developing sets of useful and clear selector terminology to describe each of several aspects of experiments, the data they produce, and the analyses and insights that derive from them.

Areas covered span real objects including anatomy and cell types, but participants recognized that anatomy is only one of several necessary components. Others included data types, methods, preparations and protocols, acquisition techniques, post-acquisition data processing, models, diseases, paradigms, and hypotheses. Participants were urged to keep in mind as they identified the concepts and entities important to each area that the terms developed should only be those that investigators working in the field can readily determine and supply, and that the community is willing to accept. We asked that this terminology not only aid the target domain, but also bridge methods and findings with data and knowledge in complementary areas, or gained using complementary techniques. Aiding participation (and adoption), it was stressed that all terminologies, like the rest of the NIFv1 deliverables, will be made available freely Open Source in a non-proprietary manner for universal adoption.

Workshops on invertebrate identified neurons, visual neuroscience I and II, hippocampus I and II, and non-pyramidal cortical neurons were carried out under SfN auspices, funded under private grants and prior NIMH contracts. The Framework added computational neuroscience and modeling, cerebellum, human neuroimaging, microscopy and neuronal ultrastructure, molluscan neurobiology, olfaction: receptors and systems, neurogenetics, neurodegenerative disease, neurodevelopment, thalamus, behavioral neuroscience, and Drosophila.

A complete list of participants is at http://brainml.org/workshops. Many participants agreed to aid future e-mail-based sessions for orderly evolution of terminologies. Post-workshop, each set of trees was edited and the majority of terms integrated in the NIFv1 core terminology; many terms were deferred for incorporation into later versions. NIFv1 trees formed the core of the NIFSTD terminologies described by Bug et al. (2008).

Workshops with Specialized Modalities

The workshop on nonpyramidal neurons was primarily a self-generated effort of several neuroscience communities that came together to codify a multi-dimensional classification scheme. (Ascoli et al. 2008). A community-approved terminology for classifying cortical neurons was thus a joint goal of this ‘Petilla nomenclature project’ (named after the meeting site at Cajal’s birthplace), directed by R. Yuste and Framework Project Director G. Ascoli. Framework project members G. Ascoli, W. Bug, D. Gardner, M.E. Martone, and G.M. Shepherd derived from parts of the Petilla nomenclature and other sources a tree with cells classified along one axis (largely morphological), with plans to have the other dimensions or schemes (e.g., molecular or physiological) represented as attributes potentially modifying terms anywhere in the basic tree.

The neuroimaging workshop was primarily devoted to spurring a collegial effort that resulted in the generous donation of several existing vocabularies and initiation of plans for sontinued cooperative development. Several classes of terms from the computational neuroscience and modeling workshop were reserved pending additional development of the complementary NeuroML (Goddard et al. 2001; Crook et al. 2007) language; these will be included in the forthcoming BrainML08 terminology, along with a tripartite scheme for representing experimental manipulations and protocols.

Multidimensional Selector Controlled Vocabulary

Central to our effort developing ‘selector’ terminology to enable individual datasets (or analytic methods, or publications) to be categorized and located via searches are vocabularies targeted towards datasets. Our scheme parses neurobiological data by three basic sets of terms, and two modifiers. These describe:

what: the neurobiological data type that is recorded or presented,

why: the neurobiological function or disease that the data relate to, and

how: the technique(s) used to acquire or derive the data.

The two modifiers are:

form: an optional modifier if data are presented as an image or a time series, and

origin: an attribute specifying how the data originated, whether from experiment or observation, simulation, or meta-analysis.

These distinct sets of terms are designed to specify the type and significance of data while avoiding the combinatorial explosion that a single tree of terms would require. Note that the terms focus on the neurobiological processes reported by the data and its significance without describing the format in which the data are presented. Similarly, we do not distinguish among closely related measures with similar neurobiological significance, such as currents vs. conductances. Many techniques listed implicitly provide such information. For example, data types include ‘blood oxygenation’ under ‘functional-imaged activity’ whereas fMRI (the technique used for data acquisition) is separately listed under techniques.

We present two sample trees. The first lists techniques:

chemical
      separations
            gel/electrophoresis
                         Southern
                         northern
                         western
            chromatography/HPLC [high-pressure liquid chromatography]
            centrifugation
      spectroscopy
            mass spectrometry [MS]
            circular dichroism [CD]
            absorbency/absorbance/fluorescence
      microdialysis
      NMR [nuclear magnetic resonance]
      calorimetry/microcalorimetry
      radioassay
      Xray crystallography
computer tomography/imaging
      CAT [computer axial tomography]
      MRI [magnetic resonance imaging]
             structural
                   diffusion/diffusion tensor imaging [DTI]
                   manganese enhanced
             functional [fMRI]
             spectrographic [sMRI]
      PET [positron emission tomography]
      SPECT [single photon emission computed tomography]
electrode-based
      extracellular
            single electrode
            tetrode
            electrode array
                  sharp electrode array
                  flat/flexible electrode array
      intracellular /whole-cell/clamp
            voltage-clamp
            patch
      amperometric
            pH
            ion sensitive (non-H ion)
      macroelectrode
            cuff/suction
            field/surface
                   EEG [electroencephalography]
electron microscopy [EM]
      SEM [scanning electron microscopy]
            secondary electron microscopy
            x-ray microscopy
            back scattered electron microscopy
      TEM [transmission electron microscopy]
            high voltage electron microscopy/HVEM
            intermediate voltage electron microscopy/IVEM
            energy filtering/EFTEM
            electron diffraction camera
      STEM [scanning transmission electron microscopy]
genomic/proteomic assays
      expression chip/microarray
      in situ hybridization
            FISH [fluorescent in situ hybridization]
            RNA in situ
      sequence
            BLAST [basic local alignment search tool]
            homology-based search
      CGH [comparative genome hybridization]
      co-immunoprecipitation
             chromatin IPCH
                   chip on chip
      transcript quantitation
            SAGE
            Solexa
            PCR
                 quantitative
                 real time
      structure comparison
probes/markers
      histological staining [to become protocol in BrainML08]
            Golgi/silver stains
            immunocytochemistry
            immunohistochemistry
            myelin
            nissl
            organelle/subcellular markers
            cell death markers
            nuclear markers
            caspases
      labeling [to become protocol in BrainML08]
            radiolabeling
            conformational stain
      reporter assays/dyes
            voltage indicator dyes
            Ca indicator dyes
            fluorescent probes
                  fluro-J
            genetically coded
            enzymatic function reporter(s)
            transcription reporter(s)
      fate mapping
            genetic
            lineage tracing
      birth dating
            thymidine
            deoxyuridine derivatives/brdu
bioactive molecules [to become protocol in BrainML08]
      physiological manipulators
            cell activation
            cell inactivation
      RNAi
      viral vectors
      chemical genetics
light microscopy / optical imaging
      bright field
      dark field
      intrinsic
      fluorescence/chemiluminescence/phosphorescence
            multi-photon
            confocal
            FRET [fluorescence resonance energy transfer]
      Nomarski/DIC [differential interference contrast]
      phase
      polarization
      spectroscopy
            ratio imaging
      stereology
      TIRM [total internal reflection microscopy]
MEG [magnetoencephalography]
mechanical sensors
      displacement transducer
      force transducer / optical tweezers
      audio/acoustic
            spectral
            sound pressure level detector [SPL]
      scanning probe / AFM [atomic force microscopy]
      ultrasound imaging
physiological monitors
      EKG [electrocardiograph]
      polygraph/polysomnograph
      spirometer/respirometer
      thermal monitors
reports/narratives
      clinical reports
      clinical rating scales [use standard nomenclature]
            Barthel stroke ratings
            updrs [Parkinsons]
            uhdrs scale
            Neuropsychological testing
      subject reports
videography and photography
      time-lapse
      videomicroscopy

Other trees specify the structure from which the data were obtained, the level of examination, and the cell type. This neuroanatomy terminology reflects extensive refinement in our thalamus workshop, co-chaired by E.G. Jones and building on work of prior workshops, functional cortical parcellation of Felleman and Van Essen (1991), and NeuroNames (Bowden and Dubach 2003), with partial rationalization by D. Bowden and by E.P. Gardner. In this scheme, we place many neural structures in a single tree, organized along a primary rostral to caudal (or superior to inferior) neuraxis. As the brain is three-dimensional, other conceptual axes are needed for second physical axis, layering or depth. Terms that are important but which supplement the tree structure, such as ‘ipsilateral’ or ‘contralateral’ are indicated as attributes modifying the tree-selected term or level. Consistent with contemporary usage, terms freely mix Latin (or Greek) derived terms with English. As example, we provide an excerpt of the primary neuroanatomy tree, using the thalamus to illustrate the overall tree structure and the level of detail for many structures; ellipses (…) mark the remaining 75% of the tree not shown here:

CNS [central nervous system]
    brain
       telencephalon
       . . .
       diencephalon
          pretectum
          thalamus
              epithalamus
                  paraventricular nucleus
                      anterior paraventricular nucleus
                      posterior paraventricular nucleus
                  habenular nucleus
                      medial habenular nucleus (Hm)
                      lateral habenular nucleus (Hl)
                          parvocellular subnucleus (Hlpc)
                          magnocellular subnucleus (Hlmc)
                  pineal/Conerium/Epiphysis
              dorsal thalamus
                  principal nuclei
                      anterior group
                          anterodorsal nucleus (AD)
                          anteroventral nucleus (AV)
                          anteromedial nucleus (AM)/interanteromedial nuc (IAV)
                          lateral dorsal nucleus (LD)
                      medial group
                          parataenial nucleus (Pt)
                          medioventral nucleus (MV)/reuniens nucleus
                          mediodorsal nucleus (MD)
                          magnocellular (medial) division (MDmc)
                          parvocellular (central) division (MDpc)
                          multiform (lateral) div (MDmf)/paralaminar div
                          densocellular/paralamellar division
                      submedial nucleus (Sm)
                  ventral group
                      ventral anterior nucleus (VA)
                          principal division (VAp)
                          magnocellular division (VAmc)
                      ventral lateral nucleus (VL)
                          ventral lateral anterior nucleus (VLa)
                          ventral lateral posterior nucleus (VLp)
                              ventral lateral posterior nucleus dorsal (VLpd)
                              ventral lateral posterior nucleus ventral
                                                                    (VLpv/VIM)
                      ventral posterior complex (VP)
                          ventral posterior lateral nucleus (VPL)
                              VPL nucleus anterior (VPLa/VPS) [Anterodorsal
                                                                    shell]
                              VPL nucleus posterior (VPLp) [Central core]
                      ventral posterior medial nucleus (VPM)
                      ventral posterior inferior nucleus (VPI)
                      parvocellular division of VP complex (VPMpc/VPpc)
                                     [Basal ventral medial nucleus(VMb)]
                  ventral medial nucleus (VM)
              lateral posterior/pulvinar complex
                  lateral posterior nucleus (LP)
                  pulvinar nuclei (Pl)
                      anterior pulvinar nucleus (Pla)
                      inferior pulvinar nucleus (Pli)
                      lateral pulvinar nucleus (Pll)
                      medial pulvinar nucleus (Plm)
              posterior group
           posterior nucleus (Po)
               posterior medial nucleus (Pom)
                   posterior lateral nucleus (Pol)
                   posterior intermediate nucleus (Poi)
                   posterior intralaminar nucleus (Pil)
               [for Limitans/suprageniculate nucleus (L/SG) see IL]
           medial geniculate complex
               ventral nucleus (MGv)
               dorsal nucleus (MGd)
                   anterior-dorsal division (MGad)
                   posterior-dorsal division (MGpd)
               medial (magnocellular) nucleus (MGmc) [IL-like zone]
           lateral geniculate nucleus
               dorsal lateral geniculate (LGd)
                  magnocellular layers
                  parvocellular layers
                  koniocellular layers
                  medial interlaminar nucleus
                  A Layers
                  C Layers
           [for ventral lateral geniculate see ventral thalamus]
       intralaminar nuclei (IL)
           anterior intralaminar group
               central/midline/rhomboid nucleus (Rh)/(Ce)
               central medial nucleus (CeM)
               central lateral nucleus (CL)
               paracentral nucleus
           central or midline group [midline ext. of Pt, Rh, CL and CeM
                                                            nuclei]
               intermediodorsal nucleus [rat]
           posterior intralaminar group
               centre median nucleus (CM)
               parafascicular nucleus (Pf)
               subparafascicular nucleus (SPf)
               limitans/suprageniculate nucleus (L/SG)
                   limitans division (L)
                   suprageniculate division (SG)
       ventral thalamus/(Prethalamus)
           reticular nucleus (R)
           zona incerta (ZI)
           nucleus of the field of Forel (FF)
           ventral lateral geniculate complex/pregeniculate nucleus (Prg)
               pars principalis
               medial division
               dorsal cap
               intergeniculate leaflet (IGL)
    subthalamic nucleus
    hypothalamus
       hypophysis
    third ventricle
. . .

Discussion

The Neuroscience Information Framework is built upon a set of coordinated terminology components enabling data and web-resource description and selection. The NIFv1 core terminologies described here form a data description language to specify and select particular neuroscience data or findings, not a true ontology. Its purpose is to provide a set of usable terms in a hierarchy so that investigators recording from, assaying, or otherwise sampling an area or a function of the nervous system can have a set of terms that encompass areas of current and likely future interest. Additional development of ontologies for the NIF is described in the accompanying Bug et al. (2008).

The NIFv1 data description language satisfies the following design goals:

  • It incorporates current usage by those who are not expert in specific areas, such as neuroanatomy, but is informed by the understanding of those who are. Thus the electrophysiologist, the neuroimager, or the molecular biologist need a context in which to place commonly-used descriptive terms in their fields. There is inevitably a tension between common usage of terms such as “pons” and “Broca’s area” and precise definitions, but we recognize that some terms will be used imprecisely and some ambiguously.

  • As different techniques yield, and different experimenters seek, more or less precision of location in the nervous system, the syntax allows for variable specificity. For the purposes of data description, terms are included that describe both broad areas (“parietal cortex” and “lumbar spinal cord”) and very specific locations. These terms are arranged in a tree hierarchy, with the most specific terms the leaves and the most general at the root.

  • Because a researcher looking for data relevant to a question does not know the degree of specificity used to describe a dataset placed in a database, or a finding in the literature, searches using general terms find as well more specific ones located on finer branches. As noted above, it would be possible to implement this terminology using graphs rather than trees, allowing multiple inheritance, but this is difficult for casual users to navigate and therefore awkward for the neuroscience community.

In the development of these terminologies, we have recognized that no single scheme can completely encompass the wide range of disparate data types, preparations, or techniques seen in contemporary neuroscience, let alone in likely future development. In particular, we have tried to develop a scheme that can intelligently record and relate what may be similar areas in principal model animals and perhaps aid integrated knowledge of nervous system function. A unified list enables description of and thereby access to data across scales and preparations, one of our contracted goals from the NIH. The alternative to this comprehensive scheme would be a distinct and precise atlas or neuranatomy for each species; these are of course available for many model animals but to represent each in a NIF-compatible form is beyond the limited scope of this project.

The results of multiple workshops have been integrated in the terminology being developed for the NIF and are also made freely available via Open Source for universal adoption. In this terminology, we have specified many descriptors, and arranged the terms useful to each in hierarchic trees. These terminologies are designed to satisfy such immediate NIF-related goals as identifying the concepts and entities important to specific areas of neuroscience, including data and experimental techniques as well as neurons and preparations. Longer-term goals include stimulating further community adoption of these terms to aid additional development of neuroinformatic resources (Gardner et al. 2003; Kennedy 2004, 2006; Koslow and Hirsch 2004; Liu and Ascoli 2007), and future efforts linking findings obtained in specific areas or preparations, or using particular techniques that yield specific data types, to related or relatable data of different types.

Our current development may therefore be thought of as an index for a book that is still being written. Completeness—defined depending on the level of detail to which each investigator can go or wishes to go—is unattainable, and this is why we our syntax represents more specific terms as branches of more general ones. If a very detailed term is not (yet) in the tree, the next level up encompasses it.

Increasingly, we believe that ontologies or knowledge bases for neuroscience are only one aspect of the wider problem of representing knowledge by metadata in other fields that directly impact real contemporary data in the neurosciences. One obvious need is for terms that bridge to, and interoperate with, conventional sequence and structure bioinformatics. For an example, consider what is needed to classify the different patch clamp data (or action potential shape or spike train patterns) resulting from manipulations that include changes in promoters, gene sequence, allelic selection, post-translational modification, alterations in protein phosphatases, and more, all of which need to be encoded in appropriate metadata in order to make sense of the data. Companion development of the NIFSTD semantic framework is designed toward this goal (Bug et al. 2008).

Complementary NIFv1 Terminology Components

Although the core NIFv1 terminologies here described do not form an ontology, these terms should inform such development, and as noted above, workshop terms are being integrated with parallel NIF-derived and integrated ontology and terminology components to form NIFSTD (Bug et al. 2008). Similarly, these terms are presented only as defined by context in trees and via common usage; we expect that extensions to this work will provide precise definitions as well. Another NIFv1 terminology project is Caltech’s Textpresso, which parses and extracts terms from a large contemporary neuroscience corpus (Müller et al. 2008). As related in this issue by Marenco et al. (2008), mediators will be able to take OWL-based and purely XML-based schemes and rationalize them probabilistically.

NIFv1 terminology also acknowledges multiple parallel efforts. An informal survey conducted among NIF Team members yielded the following list of other terminology or ontology efforts in the biomedical sciences that one or more were involved in: Gene Ontology,WormBase, NeuroNames, BrainInfo, GENSAT, Gene Network, fMRIDC, BrainML, Brain Map, W3C BioONT, IUPHAR Nomenclature, Unified Medical Language System, BIRN Ontology, Ontology of Biomedical Investigation, National Center for Biomedical Ontologies, OBO Relations / Foundry, and the International Committee on Cortical Interneuron Nomenclature.

The NIF Terminologies, Like the NIF Itself, Are Designed for Evolution and Migration

In addition to the dynamic inventory of neuroscience Web resources forthcoming at http://nif.nih.gov and http://neurogateway.org, which are annotated using NIF terminologies, terminologies (and code) are available Open Source to enable any interested group, journal, or society to establish, mirror, or enhance a Framework site. An expanding Textpresso literature repository for neuroscience is available at http://textpresso.org/neuroscience and above sites. NIFv1 and later term lists will be referenceable at http://brainml.org.

NIF terminologies are expanding. Many selector terms are being enriched through term integration by later workshops. In addition to those described here, terms are being collated to produce vocabulary trees for BrainML08’s protocols and paradigms, post-acquisition data processing, and models, diseases, and hypotheses. Believing that community development of vocabularies by neuroscientists facilitates community acceptance, we have tried to construct a terminology whose utility will itself encourage neuroscientists, in the cooperative spirit of the Open Source movement, to propose additional enhancements or extensions to this work.

Exportable Metadata and Semantic Data Models Aid Database Development as well as Resource Integration

Neurodatabase.org, our Weill-Cornell Laboratory of Neuroinformatics archive for neurophysiology data, now incorporates the Open Source NIFv1 terminology for brain area and other descriptors. As noted above, the neuraxis serves as the main tree for these adoptable Open Source selector terms; other trees (not shown) serve second axes, layer, or depth. This standardizes metadata and can potentially facilitate direct database access via NIF query methods (Fig. 3)

Fig. 3.

Fig. 3

Neurodatabase.org, the Laboratory of Neuroinformatics-developed archive of neurophysiology data, now incorporates the Open Source NIFv1 terminology for brain area and other descriptors. Exportable NIF terminology, available at http://brainml.org, standardizes metadata, aids future development of descriptors and query terms for databases, and can facilitate direct database access via NIF query screens

Information Sharing Statement

All elements of the Framework are open and Open Source. See the NIF at http://nif.nih.gov and http://neurogateway.org; terminologies are at http://brainml.org.

Acknowledgements

This project has been funded in whole or in part through the NIH Blueprint for Neuroscience Research with Federal funds from the National Institute on Drug Abuse, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN271200577531C to Weill Cornell Medical College. BrainML representation and BrainML08 development are supported by MH57153 from NIMH and computational and related metadata partially funded by MH68012 from NIMH, both to Weill Cornell Medical College. Cortical and other mammalian terminology development was aided by NS44820 from NINDS to E.P. Gardner. Early terminology meetings were funded by the Society for Neuroscience under a generous gift from Paul Allen and Jody Patton and under contract (NIH Order No. 263-MD-409125-1) from NIMH, NINDS, and NIDA. We thank the many engaged and productive participants at our expert workshops. These included five Society for Neuroscience Presidents: Michael E. Goldberg, Bernice Grafstein, Edward G. Jones, Pasco Rakic, and David Van Essen, and chairs and co-chairs who in addition to the authors included Gwen Jacobs, Maryann E. Martone, Gordon M. Shepherd, Nick Strausfeld, Jack Van Horn, Robert W. Williams, and Rafa Yuste. Additional help was provided by John H. Byrne, Holly Cline, Katherine Graubard, Ray Guillery, Takao K. Hensch, Steven S. Hsiao, Kei Ito, Harvey Karten, Robert LaMotte, Roger Lemon, Steve Lisberger, Margaret S. Livingstone, Carol A. Mason, George Paxinos, and Joseph L. Price. We had hoped to include the complete set of participants’ names as an Appendix, but at the direction of the Editors these are available only as Supplementary Material at http://brainml.org/workshops. We gratefully acknowledge the professional encouragement and cooperation received from the Society for Neuroscience and the NIF Advisory Committee: H. Akil, G. Ascoli, D. Gardner, B. Grafstein, M.E. Martone, G.M. Shepherd, P. Sternberg, D.C. Van Essen, and R. W. Williams.

Footnotes

Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Contributor Information

Daniel Gardner, Email: dan@med.cornell.edu, Laboratory of Neuroinformatics and Department of Physiology, Weill Medical College of Cornell University, 1300 York Avenue, New York, NY 10065, USA.

David H. Goldberg, Laboratory of Neuroinformatics and Department of Physiology, Weill Medical College of Cornell University, 1300 York Avenue, New York, NY 10065, USA

Bernice Grafstein, Laboratory of Neuroinformatics and Department of Physiology, Weill Medical College of Cornell University, 1300 York Avenue, New York, NY 10065, USA.

Adrian Robert, Laboratory of Neuroinformatics and Department of Physiology, Weill Medical College of Cornell University, 1300 York Avenue, New York, NY 10065, USA.

Esther P. Gardner, Department of Physiology and Neuroscience, NYU School of Medicine, New York, NY 10016, USA

References

  1. Ascoli GA, Alonso-Nanclares L, Anderson SA, Barrionuevo G, Benavides-Piccione R, et al. Petilla Interneuron Terminology Group. Petilla terminology: Nomenclature of features of GABAergic interneurons of the cerebral cortex. Nature Reviews Neuroscience. 2008;9:318–324. doi: 10.1038/nrn2402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ashburner M, et al. Gene ontology: Tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bota M, Arbib MA. Integrating databases and expert systems for the analysis of brain structures: Connections, similarities, and homologies. Neuroinformatics. 2004;2:19–58. doi: 10.1385/NI:2:1:019. [DOI] [PubMed] [Google Scholar]
  4. Bowden DM, Dubach MF. NeuroNames 2002. Neuroinformatics. 2003;1:43–60. doi: 10.1385/NI:1:1:043. [DOI] [PubMed] [Google Scholar]
  5. Bug W, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C, Laird A, et al. The NIFSTD and BIRNLex vocabularies: Building comprehensive ontologies for neuroscience. Neuroinformatics. 2008 doi: 10.1007/s12021-008-9032-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods of Information in Medicine. 1998;37:394–403. [PMC free article] [PubMed] [Google Scholar]
  7. Cimino JJ. From data to knowledge through conceptoriented terminologies: experience with the Medical Entities Dictionary. Journal of the American Medical Informatics Association. 2000;7:288–297. doi: 10.1136/jamia.2000.0070288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Crook S, Gleeson P, Howell F, Svitak J, Silver RA. MorphML: Level 1 of the NeuroML standards for neuronal morphology data and model specification. Neuroinformatics. 2007;5:96–104. doi: 10.1007/s12021-007-0003-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex (New York, N.Y.) 1991;1:1–47. doi: 10.1093/cercor/1.1.1-a. [DOI] [PubMed] [Google Scholar]
  10. Friedman C, Hripcsak G, Shagina L, Liu H. Representing information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association. 1999;6:76–87. doi: 10.1136/jamia.1999.0060076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gardner D, Akil H, Ascoli GA, Bowden DM, Bug W, Donohue DE, et al. The Neuroscience Information Framework: a data and knowledge environment for neuroscience. Neuroinformatics. 2008 doi: 10.1007/s12021-008-9024-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gardner D, Abato M, Knuth KH, DeBellis R, Erde SM. Dynamic publication model for neurophysiology databases. Philosophical Transactions of the Royal Society of Neuroinform London. Series B, Biological Sciences. 2001a;356:1229–1247. doi: 10.1098/rstb.2001.0911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gardner D, Abato M, Knuth KH, Robert A. Neuroinformatics for neurophysiology: The role, design, and use of databases. In: Koslow SH, Subramaniam S, editors. Databasing the brain: The role, design, and use of databases. New York: Wiley; 2005. pp. 47–67. [Google Scholar]
  14. Gardner D, Knuth KH, Abato M, Erde SM, White T, DeBellis R, et al. Common data model for neuroscience data and data model interchange. Journal of the American Medical Informatics Association. 2001b;8:17–31. doi: 10.1136/jamia.2001.0080017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gardner D, Toga AW, Ascoli GA, Beatty J, Brinkley JF, Dale AM, et al. Towards effective and rewarding data sharing. Neuroinformatics. 2003;1:289–295. doi: 10.1385/NI:1:3:289. [DOI] [PubMed] [Google Scholar]
  16. Goddard NH, Hucka M, Howell F, Cornelis H, Shankar K, Beeman D. Towards NeuroML: Model description methods for collaborative modelling in neuroscience. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2001;356:1209–1228. doi: 10.1098/rstb.2001.0910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Greer DS, Westbrook JD, Bourne PE. An ontology driven architecture for derived representations of macromolecular structure. Bioinformatics (Oxford, England) 2002;18:1280–1281. doi: 10.1093/bioinformatics/18.9.1280. [DOI] [PubMed] [Google Scholar]
  18. Kennedy DN. Barriers to the socialization of information. Neuroinformatics. 2004;2:367–368. doi: 10.1385/NI:2:4:367. [DOI] [PubMed] [Google Scholar]
  19. Kennedy DN. Where’s the beef? Missing data in the information age. Neuroinformatics. 2006;4:271–274. doi: 10.1385/NI:4:4:271. [DOI] [PubMed] [Google Scholar]
  20. Koslow SH, Hirsch MD. Celebrating a decade of neuroscience databases. Looking to the future of high-throughput data analysis, data integration, and discovery neuroscience. Neuroinformatics. 2004;2:267–270. doi: 10.1385/NI:2:3:267. [DOI] [PubMed] [Google Scholar]
  21. Lindberg DAB, Humphreys BL, McCray AT. The unified medical language system. Methods of Information in Medicine. 1993;32:281–291. doi: 10.1055/s-0038-1634945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Liu Y, Ascoli GA. Value added by data sharing: Longterm potentiation of neuroscience research. Neuroinformatics. 2007;5:143–145. doi: 10.1007/s12021-007-0009-0. [DOI] [PubMed] [Google Scholar]
  23. Marenco L, Li Y, Martone ME, Sternberg PW, Shepherd GM, Miller PL. Issues in the design of a pilot concept-based query interface for the Neuroinformatics Information Framework. Neuroinformatics. 2008 doi: 10.1007/s12021-008-9035-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Müller H-M, Rangarajan A, Teal TK, Sternberg PW. Textpresso for neuroscience: searching the full text of thousands of neuroscience research papers. Neuroinformatics. 2008 doi: 10.1007/s12021-008-9031-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Van Horn JD, Gazzaniga MS. Maximizing information content in shared and archived neuroimaging studies of human cognition. In: Koslow SH, Subramaniam S, editors. Databasing the brain: The role, design, and use of databases. New York: Wiley; 2005. pp. 449–458. [Google Scholar]

RESOURCES