A Roadmap for caGrid, an Enterprise Grid Architecture for Biomedical Research

Joel Saltz; Shannon Hastings; Stephen Langella; Scott Oster; Tahsin Kurc; Philip Payne; Renato Ferreira; Beth Plale; Carole Goble; David Ervin; Ashish Sharma; Tony Pan; Justin Permar; Peter Brezany; Frank Siebenlist; Ravi Madduri; Ian Foster; Krishnakant Shanbhag; Charlie Mead; Neil Chue Hong

. Author manuscript; available in PMC: 2012 Mar 2.

Published in final edited form as: Stud Health Technol Inform. 2008;138:224–237.

A Roadmap for caGrid, an Enterprise Grid Architecture for Biomedical Research

Joel Saltz ¹, Shannon Hastings ¹, Stephen Langella ¹, Scott Oster ¹, Tahsin Kurc ¹, Philip Payne ¹, Renato Ferreira ¹, Beth Plale ², Carole Goble ³, David Ervin ¹, Ashish Sharma ¹, Tony Pan ¹, Justin Permar ¹, Peter Brezany ⁴, Frank Siebenlist ⁵, Ravi Madduri ⁵, Ian Foster ⁵, Krishnakant Shanbhag ⁸, Charlie Mead ⁶, Neil Chue Hong ⁷

PMCID: PMC3292259 NIHMSID: NIHMS356363 PMID: 18560123

Abstract

caGrid is a middleware system which combines the Grid computing, the service oriented architecture, and the model driven architecture paradigms to support development of interoperable data and analytical resources and federation of such resources in a Grid environment. The functionality provided by caGrid is an essential and integral component of the cancer Biomedical Informatics Grid (caBIG^™) program. This program is established by the National Cancer Institute as a nationwide effort to develop enabling informatics technologies for collaborative, multi-institutional biomedical research with the overarching goal of accelerating translational cancer research. Although the main application domain for caGrid is cancer research, the infrastructure provides a generic framework that can be employed in other biomedical research and healthcare domains. The development of caGrid is an ongoing effort, adding new functionality and improvements based on feedback and use cases from the community. This paper provides an overview of potential future architecture and tooling directions and areas of improvement for caGrid and caGrid-like systems. This summary is based on discussions at a roadmap workshop held in February with participants from biomedical research, Grid computing, and high performance computing communities.

1. Introduction

Collaborative biomedical research studies, which involve participation of multiple institutions and require integration of disparate, heterogeneous data and analytical resources, have long been hindered by paucity of interoperable resources and lack of systems to link them. As a result, rich collections of distributed information resources and complementary expertise of research groups are underutilized in most biomedical research domains. Recognizing these obstacles and problems in cancer research, the National Cancer Institute (NCI) has established the cancer Biomedical Informatics Grid (caBIG^™) program (https://cabig.nci.nih.gov). The objective of this program is to develop enabling informatics technologies for collaborative, multi-institutional biomedical research and to create a voluntary network of cancer centers and research laboratories with the overarching goal of accelerating translational cancer research. The caGrid infrastructure [1–3] is an integral component of caBIG^™. caGrid is designed to provide the core infrastructure for federating data and analytical resources and applications deployed at different institutions within the guidelines and policies accepted by the caBIG^™ community and for enabling researchers to both query, integrate, and synthesize information from existing distributed resources as well as contribute new resources and applications to the caBIG^™ environment.

The main application community motivating the need for caGrid is the cancer research community. However, the infrastructure has been designed and implemented as a general middleware system which can support other biomedical application domains. caGrid is built as a service-oriented architecture on top of Grid Service technologies; more specifically, on the Web Services Resource Framework (WSRF) standards (http://www.oasis-open.org/committees/wsrf/)[4]. It also draws from the model driven architecture paradigm to enable syntactic and semantic interoperability among resources and rich metadata driven discovery and query of distributed resources. To this end, it leverages and supports the concepts of controlled vocabularies, strongly-typed services, common data elements, published information models, and rich service metadata. In essence, caGrid combines Grid computing, service oriented architecture, and model driven architecture in an integrated framework.

caGrid has been released to the community through several versions. Starting with version 1.0, the caGrid infrastructure has been deployed in the caBIG^™ environment for production use and as an Enterprise Grid middleware system. The latest version is caGrid 1.2, released in March 2008 and available at the project URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid.

caGrid development is an ongoing effort, improving and adding new functionality based on feedback from caGrid users and new requirements from the community. In this paper, we present and discuss a summary of potential future architecture, tooling, and community development directions and areas of improvement for caGrid and caGrid like biomedical informatics systems. We provide a summary of requirements and proposed directions in six areas: caGrid Service Architecture, Semantic Infrastructure, Security Infrastructure, Support for Federated Query and Orchestration of Grid Services, External Interoperability/Enterprise Systems, and Governance of caGrid Middleware Development. This summary is based on discussions at a roadmap workshop held in February with participants from biomedical research, Grid computing, and high performance computing communities. More information about the workshop and a more detailed workshop report can be accessed from http://www.cagrid.org/wiki/CaGrid:RoadmapWorkshop.

2. Background: caGrid Architecture

caGrid is a service-oriented Grid software infrastructure. It leverages Grid Services technologies and Grid systems, including the Globus Toolkit (http://www.globus.org)[5, 6] and Mobius[7], and tools developed by the NCI such as the caCORE infrastructure[8, 9]. Each data and analytical resource in caGrid is implemented as a Grid Service, which interacts with other resources and clients using Grid Service protocols. caGrid services are standard WSRF (version 1.2) services and can be accessed by any specification-compliant client. The caGrid infrastructure also consists of coordination services, runtime environment to support the deployment, execution, and invocation of data and analytical services, and tools for easier development of services, management of security, and composition of services into workflows.

A salient characteristic of caGrid, which differentiates it from other Grid middleware systems, is the focus on syntactic and semantic interoperability, driven by the guidelines and requirements developed by the caBIG^™ community [10]. caGrid adopts a model-driven architecture approach. Client and service APIs in caGrid represent an object-oriented view of data and analytical resources. These APIs operate on registered domain models, expressed as object classes and relationships between the classes. caGrid leverages existing NCI data modeling infrastructure to manage, curate, and employ the data models. Domain models are defined in Unified Modeling Language (UML) and converted into common data elements. These common data elements are in turn registered in the Cancer Data Standards Repository (caDSR)[8, 9]. The definitions of these data elements draw from vocabulary registered in the Enterprise Vocabulary Services (EVS)[8, 9]. The concepts of data elements and the relationships among the data elements thus are semantically described.

Clients and services communicate through the Grid using messages encoded in XML. In caGrid, when an object is transferred over the Grid between clients and services, it is serialized into a XML document that adheres to a XML schema registered in the Mobius Global Model Exchange (GME) service[7]. As the caDSR and EVS define the properties, relationships, and semantics of caBIG^™ data types, the GME defines the syntax of their XML materialization.

caGrid provides extensive support for researchers to discover distributed resources by taking advantage of rich structural and semantic descriptions of data models and services. Each caGrid service is required to provide service metadata. The base metadata contains information about the service-providing cancer center, such as the point of contact and the institution’s name providing the service. It also describes the objects used as input and output of the service’s operations. The definitions of the objects themselves are described in terms of their underlying concepts, attributes, attribute value domains, and associations to other objects being exposed as extracted from the caDSR. In addition, the service metadata specifies the operations or methods the service provides, and allows semantic concepts extracted from the EVS to be applied to them. This base metadata is extended for different types of services. For instance, data services comply with an additional “domain model” metadata standard, which details the domain model, including associations and inheritance information, from which the objects being exposed by the service are drawn. When a service is deployed, its service metadata is registered with an indexing registry service, called the Index Service. A researcher can discover services of interest by looking them up in this registry. caGrid provides a series of high-level APIs for performing searches on these metadata standards. For instance, a client can search for analytical services that provide operations that take a data type representing a given concept as input.

Security is essential for successful deployment of the caGrid environment because of the need to protect intellectual property of researchers and to ensure protection and privacy of patient related information. A comprehensive set of services are provided in caGrid to support secure and controlled access to resources based on policies set forth by the owners of the resources[11]. These services enable Grid-wide management of user credentials, support for grouping of users into virtual organizations for role based access control, and management of trust fabric in the Grid.

3. A Roadmap for caGrid-like Biomedical Informatics Systems

In this section, we present an overview of potential architecture and tooling directions for caGrid like systems, describe potential areas of improvement, and discuss management models for community driven development of caGrid and similar biomedical informatics middleware systems. We focus on six core areas: caGrid Service Architecture, Semantic Infrastructure, Security Infrastructure, Support for Federated Query and Orchestration of Grid Services, External Interoperability/Enterprise Systems, and Governance of caGrid Middleware Development.

3.1. caGrid Service Architecture

caGrid is developed as an open source and standards compliant system. Compliance with standards is important for wider adoption of the infrastructure and integration with other middleware systems. Presently, caGrid employs standards that cover basic interoperability, service, and communication specifications at the service interface layer. These standards need to be augmented by higher level standards and community-accepted specifications revolving around the areas of semantic descriptions of services and operations. This would allow for greater semantic interoperability of and programmatic access to resources exposed through service interfaces. A challenge in adopting additional standards in caGrid will be to achieve seamless integration and interplay of the various standards. In addition, while a base layer of interoperability needs to be established, mapping tools will be needed to support rapid and dynamic integration of Grid service based efforts conducted by different groups. In essence, a combination of community accepted interoperability guidelines and standards (such as those supported in caGrid) and ability to efficiently map between standards will be critical to effective application of biomedical informatics Grid middleware systems.

As for tooling to support Grid-enablement of data source, caGrid makes use of caCORE SDK [9] to expose object-oriented views of relational databases as services. There are an increasing number of applications that make use of XML and OWL/RDF based backend systems. These applications will benefit from XML and RDF/OWL data services. Tooling is needed to support easy creation of those types of data services. Mechanisms will be needed to translate between the common query language of caGrid and XML query languages such as XPath and XQuery. With RDF/OWL data services, an added challenge is the incorporation of semantic querying and reasoning capabilities in the environment. Moreover, extensions to the caDSR, EVS, and GME infrastructure will be needed to support publication of RDF/OWL Ontology definitions. Support is also needed to be able to easily map Grid-level object models to existing XML and RDF/OWL databases.

An increasing number of biomedical applications involve processing of large volumes of data. Supporting these applications in the Grid requires high performance analytical services, the backend of which should leverage distributed memory clusters, filter/stream based high performance computing, multi-core systems, SMP systems, and parallel file systems. The caGrid team is looking at mechanisms to create gateways between high-performance computing systems and the caGrid environment in order to make it possible to use such systems while maintaining the look and feel of the caGrid environment. Additional effort is needed in the areas of common tools to enable high-performance computing (HPC) application authors to describe their performance tuning and job execution parameters and expose them through their Grid service. In addition, an analysis workflow may include services that expose data-parallel HPC applications and that exchange data with each other. To support such workflows, Grid tools and runtime infrastructure are needed to enable parallel MxN data communication operations [12, 13] between two data-parallel services. Here, M is the number of nodes, on which one of the data-parallel services is executed, and N is the number of nodes, on which the other data-parallel service is executed.

3.2. Semantic Infrastructure

In order for two entities to correctly interact with each other (a client program with a resource, or a resource with another resource) in a Grid environment, they should agree on both the structure of and the semantic information associated with data object(s) they exchange. An agreement on the data structure is needed so that the program consuming an object produced by the other program can correctly parse the data object. Agreement on the semantic information is necessary so that the consumer can interpret the contents of the data object correctly.

Semantic and syntactic interoperability is a key goal of the caBIG^™ program. Thus, the program has established a core framework and a set of interoperability guidelines and practices [10]. The core framework builds on the notions of common data elements, published information models, and controlled vocabularies. The curation and publication of semantically annotated common data elements in caBIG^™ is done through a review process that allows freedom of expression from data and tool providers, while still building on a common ontological backbone. This model is both the heart of the success of the caBIG^™ approach and its biggest obstacle for would-be adopters or data and tooling providers. It model is likely to introduce bottlenecks, because of the reliance on community review and curation of every domain model and data type used in the Grid environment, when the number of participants, projects, and tools continues to grow. Those particularly susceptible to a high cost are those working in new domains with data types and concepts that may be partially or largely uncovered by the existing ontology and data models. Scaling the infrastructure and processes to accommodate such communities will be critical to its success. A tempting view point is to simply allow projects to either not register their models or anchor them to the shared ontology. This, however, is likely to create silos of non-interoperable applications which caBIG^™ and caGrid set out to integrate. Maintaining the high level of integrity necessary for an ontology backbone without centralized control will be a key challenge, as well as a necessity to scale towards the next generation of the science Grid.

It is desirable to have a more distributed model for the management of semantic infrastructure. This includes terminology and ontology management, data model registries, and service and data annotations. The primary intention is the desire to allow groups to get started with the infrastructure quickly, by allowing them to use local (likely pre-existing) information models and controlled terminology. In a distributed model, these local models and terminologies could be registered with locally deployed model and terminology servers. In a scenario like this, the technology employed could be the same, but the governance and policy for what models and terminology can be used is relaxed or removed. The problem then shifts from upfront harmonization to creating processes and procedures for how such local instances can be reviewed, harmonized, approved, or certified. Mechanisms, both technological and policy-oriented, are needed for obtaining this desired harmonization in the face of numerous disparate local resources. It would be important to facilitate the process of mapping terminologies to each other. For example, if an institution utilized a local terminology, and wanted to continue to do so, they could be required to map this terminology to other community-reviewed or standardized terminologies. It is not clear that this would be an efficient long term solution, or even less work than harmonizing up front, however. There is also a need to facilitate capturing community feedback of such items (services, terminologies, models). That is, a primary driver of the desire to remove an upfront centralized review is the thought that community consensus and standardization can grow organically from the perceived or measured utility of a given item. Possible mechanisms for capturing this feedback included things like usage statistics, voting or feedback systems, and referral networks. For example, if a new un-reviewed terminology or information model is provided, its merit may be able to be judged by capturing information about how many services are using it and community provided ratings or feedback. A referral network may be an interesting model worth exploring wherein community approved domain experts or official reviewers may carry more weight in the review of a given item. Infrastructure will need to be developed to capture and manage the ratings, feedback, and other solicited information, as well as the inherent trust model necessary for such a framework. A slightly less extreme position is the need for central review of annotations is not the same as the need for central annotation. That is, the governance model need not change drastically in the face of distributed registries; just the technology.

3.3. Security Infrastructure

The Grid Authentication and Authorization with Reliably Distributed Services (GAARDS) infrastructure of caGrid is a comprehensive set of services and tools for the administration and enforcement of security policy in a Grid environment[11]. Nevertheless, new components are needed to address requirements associated with compliance with federal e-authentication guidelines, compliance with regulatory policy, establishment of a Grid-wide user directory.

The caBIG^™ program has chosen to adopt the Federal e-Authentication initiative, which provides guidelines for authentication and leves of assurance (http://www.cio.gov/eauthentication/). Additional tools or extensions to existing tools will need to be developed for managing and provisioning level of assurance level 3 and level 4 accounts. Bringing organizations to level 2, level 3, and level 4 present major challenges and will require the development of best practices, statement of procedures, and tools to aid them in doing so. In many cases, a solution will need to be employed that addresses and individual organization issues. Once a solution is architected a scalable framework for evaluating and auditing organizations for compliance with the e-authentication guidelines will also be required.

In the caBIG^™ community authorization and access control requirements vary significantly amongst service providers, because of this the caBIG^™ community has chosen to leave authorization up to individual service providers. GAARDS provides tools that enable service providers to make authorization decisions. However, in a large, federated environment it can be difficult to confidently determine one’s Grid identity. To help alleviate some of the difficulties, a Grid-wide user directory is required containing accurate information or attributes about users. In addition, a community effort is required to be developed a harmonized set of user attributes that each Identity Provider will be required to provide. Moving forward the architecture will also need to adapt to be able to support multiple attribute providers per user.

Extensive auditing is required to meet compliance and regulatory guidelines such as 21CRF Part 11 and HIPAA which are critical in being able to exchange protected health information (PHI). The importance of auditing in limiting the ability to exchange PHI is a motivating factor in architecting a community based approach to auditing. A community working group should be formed to collect auditing requirements and develop and ontology and data model for auditing. Moreover, an auditing infrastructure should be developed for caGrid that would collect auditing information based on the community approved auditing data model.

3.4. Workflow and Federated Query Support

caGrid provides a workflow management service that supports the execution and monitoring of workflows expressed in the Business Process Execution Language (BPEL) (http://www.ibm.com/developerworks/library/specification/ws-bpel/). The use of BPEL in caGrid facilitates easier sharing and exchange of workflows. Although a powerful language, BPEL has proved to be difficult for users to grasp. There are frameworks such as WEEP (http://weep.gridminer.org/index.php/Main_Page) that are designed to provide high level API and runtime support for management and execution of BPEL based workflows. Nevertheless, advanced user interfaces and higher-level support are needed for easy and efficient use of workflows as part of the biomedical research process and wider adoption of workflow support by end users. Consequently, recent efforts have been made to integrate with the Taverna Workflow Management System [14] (in version 1.2 of caGrid core infrastructure) as part of a collaboration with the Integrative Cancer Research Workspace of caBIG^™. Taverna adds value to caGrid infrastructure by: a more open approach to service components, using shims to mediate between incompatible services, and consequently a very large number of public services become available to caGrid users; a graphical development GUI and wide community uptake. The myExperiment[15] (http://myexperiment.org) social networking site provides a Web 2.0 facebook-style forum for sharing Taverna workflows that caGrid can leverage.

Another consideration when extending workflow support in caGrid and similar systems is to facilitate composition of Grid services into workflows without requiring any modifications to the application-specific service code, i.e., the implementation of a service need not depend on whether the service will be part of a workflow or not. To support this requirement, a workflow helper service, which would coordinate with the workflow management service, would be needed. The helper service would directly be responsible for the integration of an application service into a workflow. It would handle the process of receiving data from upstream services in the workflow, invoking the methods of the application service as specified in the workflow description, and staging the results from service invocations to downstream services. Workflow support should also incorporate provenance data collection functionality for at least an initial set of parameters in a non intrusive way. The idea is to track information about the relationship of the data sets being generate during workflow execution and keeping information on versions.

caGrid provides support for federated querying of multiple data services to enable distributed aggregation and joins on object classes and object associations defined in domain object models. The current support for federated query is aimed at the basic functionality required for data subsetting and integration. Extensions to this basic support are needed to provide more comprehensive support. Scalability of federated query support is important when there are large numbers of clients and queries span large volumes of data and a large number of services. Middleware components need to be developed that will enable distributed execution of queries by using HPC systems available in the environment as well as by carefully creating sub-queries, pushing them to services or groups of services for execution, and coordinating data exchange between services to minimize communication overheads. Another architecture requirement for federated query components is the support for semantic queries. Middleware components are needed to support queries that involve selection and join criteria based not only on a data model, which represents the structure and attributes of objects in a dataset, but also on semantic annotations. A key requirement is to be able to support reasoning on ontologies based on description logic (DL) so that a richer set of queries can be executed and answered based on annotations inferred from explicit annotations.

3.5. External Interoperability/Enterprise Systems

While standards facilitate interoperability to a large extent, middleware systems developed using a particular standard and framework are not likely to be able to readily interact with middleware systems developed on top of other standards. For instance, caGrid services are standard WSRF v1.2 services, and WSRF makes use of XML technologies for data representation and exchange. However, WSRF is not directly compatible with HL7 and IHE, which also use XML.

Development of support for integration with other middleware systems and Enterprise systems will need to consider multiple axes of interoperability and address both the requirements associated with consumption of data by a system as well as the requirements of bidirectional interchange of data. Shared domain semantics manifested as (published) information models, common data type definitions, common/controlled terminologies, and standards for data type bindings are clearly at the heart of interoperability. Harmonization of security and policies is another axis. Socio-technical factors form another axis of interoperability and integrated use of systems effectively. Enterprise integration paradigms on messaging, documents, and services also are one of the axes of interoperability.

From a middleware implementation point of view, tools and services are needed to enable efficient mappings between different messaging standards, controlled vocabularies, and data types associated with many communities. Some tools will be needed to harmonize an external standard for data representation with the common data models and ontologies accepted by a community so that semantic interoperability with external systems can be achieved. Other tools and services will need to be implemented as gateways between different protocols to support on-the-fly transformation of messages and resource invocations.

It is also recommended that quantifiable (quantitative) metrics be defined and developed to assess interoperability of services and software components. Given that interoperability is a broad subject and spans many possible scenarios, the focus should be on practical usage scenarios. The verification and validation of usage scenarios and interoperability evaluation metrics should also be addressed.

3.6. Governance of Middleware Development

It is important to establish a formal community of stakeholders in any large scale middleware development effort. For caGrid, this includes the end users of applications which build on caGrid, the developers of those applications, the developers of caGrid and related middleware, and the funding bodies with vested interest in the direction, scope, and capabilities of the middleware. Establishing and fostering a community of stakeholders in a formal fashion is critical to the foundation of any potential governance model that would be developed, because the scope and direction of the governance must be derived from the interests of the community. That is, the common interests of the community will be what define the community itself. Another identified key aspect of success, which the fostering of such a community will facilitate, is openness of communication. Lessons identified in a report developed by the NEESgrid team, describing “important lessons we have learned through our experiences in community cyber-environment development, and specifically, through our experience developing one of the first large-scale community cyber-environments.” (http://www.nsf.gov/od/oci/CPMLL.pdf), are good examples of the need for community building and openness of communication.

In any governance model, when setting priorities or direction, the community should have a voice and must be aware of the rational of the “decision makers” even if a decision is made contrary to their opinion. Such openness can help mitigate or even resolve divergence, which is a common and primary concern of a governance model. If open discussions occur, the whole community is more likely to remain cohesive and come to a mutual understanding of the tradeoffs involved. Even when there are no diverging drivers, openness is critical to end users and application developers. As the primary “customers” of middleware are application developers, providing them insight into what is coming in future releases and what problems the development team is currently working on, enable them to better plan for the future. Providing such insight will also facilitate better information exchange and feedback. This also allows for a better dialogue between the stakeholders and the developers of the infrastructure, and ensure there are less surprises or disappointments; it ensures the scope and priorities are vetted with the community.

Another area of importance is how to manage and support contributions to a community driven middleware system (e.g., caGrid). A contribution is anything that may be of use to the community, but is not specifically created by the formal “development team.” This may include: useful code or documentation developed by external projects that may be of use to other middleware users, best practices or lessons learned in using the middleware, any higher-level or application oriented code, or even a formal working collaboration with other such middleware projects. The mechanism of contribution may vary greatly with each such contribution. That is, some contributions may warrant full “commit access” of external contributors or adoption and incorporation of the contributed code by the caGrid development. Others may remain external to the core product, but be officially “blessed” as useful or compatible contributed works. Finally, some may be quite informal and just identified as “relevant external works” to the community. It is critical that the community adopt a governance model and process which prescribes how such contribution should be addressed. Namely, the categories of contributions (e.g. incorporation, incubation, certification, or reference) need to be identified, and the requirements of those categories should be detailed. For example, the expectations of a contributed work which is to be incorporated into the official distribution of caGrid are likely higher than that of an external work that is just referenced as relevant to the community. The governance model must also prescribe the process by which contributions are evaluated with respect to the requirements. The resource allocation necessary to support such a model clearly scales in relation to the number of contributions, and appropriate consideration should be made when planning the team’s time and scope. Finally, and probably most importantly, the requirements and evaluation of the requirements for a given contribution should be made publicly available to the community. It is critical that the community know, for example, the support model and amount of testing or vetting that has taken place for “contributed works.” There are also importance considerations from the contributor’s perspective. Specifically, a set of policies and practices need to be established such that the interests of those contributing works are preserved. Common interests relate to the licensing, accreditation, and branding model being used. That is, for many groups it is important that they maintain their own branding and accreditation for their works.

In terms of release mechanisms for the infrastructure, the “milestone” or feature prelease mechanism seems to be a valuable way for target groups to provide early feedback on new areas of development. Along the same lines, more modular or smaller component oriented releases may provide value to groups working with particular pieces of the infrastructure, however clear indication of compatibility to other versions and other pieces of the infrastructure would be critical. It is also critical that a support plan be identified for adopters. Groups are less likely to adopt the infrastructure if there is not a clear long term support and upgrade plan for them.

4. Conclusions

This paper has summarized a set of recommendations on future architecture and tooling directions for caGrid and similar systems and discussed governance models for community-driven middleware development. These recommendations are based on discussions in a roadmap workshop. While this paper is not a complete description of all requirements and biomedical informatics middleware architecture choices, it provides a good starting point for further discussions on biomedical informatics software requirements, ideas and directions for next generation of middleware systems, and how the management of development of caGrid-like systems could evolve to better support and foster large scale, multi-institutional biomedical research both in caBIG^™ as well as other areas.

Bibliography

1.Oster S, Hastings S, Langella S, Ervin D, Madduri R, Kurc T, Siebenlist F, Foster I, Shanbhag K, Covitz P, Saltz J. caGrid 1.0: A Grid Enterprise Architecture for Cancer Research. Proceedings of the 2007 American Medical Informatics Association (AMIA) Annual Symposium; Chicago, IL. 2007. [PMC free article] [PubMed] [Google Scholar]
2.Oster S, Langella S, Hastings S, Ervin D, Madduri R, Phillips J, Kurc T, Siebenlist F, Covitz P, Shanbhag K, Foster I, Saltz J. caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research. Journal of American Medical Informatics Association (JAMIA) 2008 doi: 10.1197/jamia.M2522. PrePrint: Accepted Article. Published December 20, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Saltz J, Oster S, Hastings S, Kurc T, Sanchez W, Kher M, Manisundaram A, Shanbhag K, Covitz P. caGrid: Design and Implementation of the Core Architecture of the Cancer Biomedical Informatics Grid. Bioinformatics. 2006;22:1910–1916. doi: 10.1093/bioinformatics/btl272. [DOI] [PubMed] [Google Scholar]
4.Foster I, Czajkowski K, Ferguson D, Frey J, Graham S, Maguire T, Snelling D, Tuecke S. Modeling and Managing State in Distributed Systems: The Role of OGSI and WSRF. Proceedings of IEEE. 2005;93:604–612. [Google Scholar]
5.Foster I. Globus Toolkit Version 4: Software for Service-Oriented Systems. Journal of Computational Science and Technology. 2006;21:523–530. [Google Scholar]
6.Foster I, Kesselman C. Globus: A Metacomputing Infrastructure Toolkit. International Journal of High Performance Computing Applications. 1997;11:115–128. [Google Scholar]
7.Hastings S, Langella S, Oster S, Saltz J. Distributed Data Management and Integration: The Mobius Project. Proceedings of the Global Grid Forum 11 (GGF11) Semantic Grid Applications Workshop; Honolulu, Hawaii, USA. 2004. pp. 20–38. [Google Scholar]
8.Covitz PA, Hartel F, Schaefer C, Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH. caCORE: A Common Infrastructure for Cancer Informatics. Bioinformatics. 2003;19:2404–2412. doi: 10.1093/bioinformatics/btg335. [DOI] [PubMed] [Google Scholar]
9.Phillips J, Chilukuri R, Fragoso G, Warzel D, Covitz PA. The caCORE Software Development Kit: Streamlining construction of interoperable biomedical information services. BMC Medical Informatics and Decision Making. 2006;6 doi: 10.1186/1472-6947-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.caBIG Compatibility Guidelines. 2005 https://cabig.nci.nih.gov/guidelines_documentation/caBIGCompatGuideRev2_final.pdf.
11.Langella S, Oster S, Hastings S, Siebenlist F, Phillips J, Ervin D, Permar J, Kurc T, Saltz J. The Cancer Biomedical Informatics Grid (caBIG™) Security Infrastructure. Proceedings of the 2007 American Medical Informatics Association (AMIA) Annual Symposium; Chicago, IL. 2007. [PMC free article] [PubMed] [Google Scholar]
12.Lee J-Y, Sussman A. High performance communication between parallel programs. Proceedings of 2005 Joint Workshop on High-Performance Grid Computing and High-Level Parallel Programming Models (HIPS-HPGC 2005); IEEE Computer Society Press; 2005. [Google Scholar]
13.Wu JS, Sussman A. Flexible control of data transfers between parallel programs. Proceedings of the Fifth International Workshop on Grid Computing - GRID 2004; IEEE Computer Society Press; 2004. pp. 226–234. [Google Scholar]
14.Hull D, Wolstencroft K, Stevens R, Goble CA, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 2006;34(Web Server issue):729–732. doi: 10.1093/nar/gkl320. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Roure D, Goble C, Stevens R. Designing the myExperiment Virtual Research Environment for the Social Sharing of Workflows. e-Science 2007 - Third IEEE International Conference on e-Science and Grid Computing; Bangalore, India. December 2007; pp. 603–610. [Google Scholar]

[R1] 1.Oster S, Hastings S, Langella S, Ervin D, Madduri R, Kurc T, Siebenlist F, Foster I, Shanbhag K, Covitz P, Saltz J. caGrid 1.0: A Grid Enterprise Architecture for Cancer Research. Proceedings of the 2007 American Medical Informatics Association (AMIA) Annual Symposium; Chicago, IL. 2007. [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Oster S, Langella S, Hastings S, Ervin D, Madduri R, Phillips J, Kurc T, Siebenlist F, Covitz P, Shanbhag K, Foster I, Saltz J. caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research. Journal of American Medical Informatics Association (JAMIA) 2008 doi: 10.1197/jamia.M2522. PrePrint: Accepted Article. Published December 20, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Saltz J, Oster S, Hastings S, Kurc T, Sanchez W, Kher M, Manisundaram A, Shanbhag K, Covitz P. caGrid: Design and Implementation of the Core Architecture of the Cancer Biomedical Informatics Grid. Bioinformatics. 2006;22:1910–1916. doi: 10.1093/bioinformatics/btl272. [DOI] [PubMed] [Google Scholar]

[R4] 4.Foster I, Czajkowski K, Ferguson D, Frey J, Graham S, Maguire T, Snelling D, Tuecke S. Modeling and Managing State in Distributed Systems: The Role of OGSI and WSRF. Proceedings of IEEE. 2005;93:604–612. [Google Scholar]

[R5] 5.Foster I. Globus Toolkit Version 4: Software for Service-Oriented Systems. Journal of Computational Science and Technology. 2006;21:523–530. [Google Scholar]

[R6] 6.Foster I, Kesselman C. Globus: A Metacomputing Infrastructure Toolkit. International Journal of High Performance Computing Applications. 1997;11:115–128. [Google Scholar]

[R7] 7.Hastings S, Langella S, Oster S, Saltz J. Distributed Data Management and Integration: The Mobius Project. Proceedings of the Global Grid Forum 11 (GGF11) Semantic Grid Applications Workshop; Honolulu, Hawaii, USA. 2004. pp. 20–38. [Google Scholar]

[R8] 8.Covitz PA, Hartel F, Schaefer C, Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH. caCORE: A Common Infrastructure for Cancer Informatics. Bioinformatics. 2003;19:2404–2412. doi: 10.1093/bioinformatics/btg335. [DOI] [PubMed] [Google Scholar]

[R9] 9.Phillips J, Chilukuri R, Fragoso G, Warzel D, Covitz PA. The caCORE Software Development Kit: Streamlining construction of interoperable biomedical information services. BMC Medical Informatics and Decision Making. 2006;6 doi: 10.1186/1472-6947-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.caBIG Compatibility Guidelines. 2005 https://cabig.nci.nih.gov/guidelines_documentation/caBIGCompatGuideRev2_final.pdf.

[R11] 11.Langella S, Oster S, Hastings S, Siebenlist F, Phillips J, Ervin D, Permar J, Kurc T, Saltz J. The Cancer Biomedical Informatics Grid (caBIG™) Security Infrastructure. Proceedings of the 2007 American Medical Informatics Association (AMIA) Annual Symposium; Chicago, IL. 2007. [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Lee J-Y, Sussman A. High performance communication between parallel programs. Proceedings of 2005 Joint Workshop on High-Performance Grid Computing and High-Level Parallel Programming Models (HIPS-HPGC 2005); IEEE Computer Society Press; 2005. [Google Scholar]

[R13] 13.Wu JS, Sussman A. Flexible control of data transfers between parallel programs. Proceedings of the Fifth International Workshop on Grid Computing - GRID 2004; IEEE Computer Society Press; 2004. pp. 226–234. [Google Scholar]

[R14] 14.Hull D, Wolstencroft K, Stevens R, Goble CA, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 2006;34(Web Server issue):729–732. doi: 10.1093/nar/gkl320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Roure D, Goble C, Stevens R. Designing the myExperiment Virtual Research Environment for the Social Sharing of Workflows. e-Science 2007 - Third IEEE International Conference on e-Science and Grid Computing; Bangalore, India. December 2007; pp. 603–610. [Google Scholar]

PERMALINK

A Roadmap for caGrid, an Enterprise Grid Architecture for Biomedical Research

Joel Saltz

Shannon Hastings

Stephen Langella

Scott Oster

Tahsin Kurc

Philip Payne

Renato Ferreira

Beth Plale

Carole Goble

David Ervin

Ashish Sharma

Tony Pan

Justin Permar

Peter Brezany

Frank Siebenlist

Ravi Madduri

Ian Foster

Krishnakant Shanbhag

Charlie Mead

Neil Chue Hong

Abstract

1. Introduction

2. Background: caGrid Architecture

3. A Roadmap for caGrid-like Biomedical Informatics Systems

3.1. caGrid Service Architecture

3.2. Semantic Infrastructure

3.3. Security Infrastructure

3.4. Workflow and Federated Query Support

3.5. External Interoperability/Enterprise Systems

3.6. Governance of Middleware Development

4. Conclusions

Bibliography

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Roadmap for caGrid, an Enterprise Grid Architecture for Biomedical Research

Joel Saltz

Shannon Hastings

Stephen Langella

Scott Oster

Tahsin Kurc

Philip Payne

Renato Ferreira

Beth Plale

Carole Goble

David Ervin

Ashish Sharma

Tony Pan

Justin Permar

Peter Brezany

Frank Siebenlist

Ravi Madduri

Ian Foster

Krishnakant Shanbhag

Charlie Mead

Neil Chue Hong

Abstract

1. Introduction

2. Background: caGrid Architecture

3. A Roadmap for caGrid-like Biomedical Informatics Systems

3.1. caGrid Service Architecture

3.2. Semantic Infrastructure

3.3. Security Infrastructure

3.4. Workflow and Federated Query Support

3.5. External Interoperability/Enterprise Systems

3.6. Governance of Middleware Development

4. Conclusions

Bibliography

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases