Abstract
Considerable work has been taken by researchers to address the need for temporal data deduction in biomedical applications, but relatively little research has examined how to create robust, efficient approaches for such methods using large databases. We present the design and evaluation of a distributed architecture that can be dynamically optimized to perform large-scale abstraction of temporal data.
INTRODUCTION
We have previously developed a system called RASTA to support deductive reasoning in database applications [1]. RASTA implements the knowledge-based temporal abstraction (KBTA) method [2] in querying multiple subjects. The program relies on a knowledgebase consisting of an ontology with one or more abstraction hierarchies and a detailed specification for each abstraction. A primary challenge in building RASTA is implementing a software architecture for dynamically distributed processing of abstraction hierarchies. We report on the design and evaluation of our architecture.
METHOD
We first examined how the data-processing steps defined by an abstraction hierarchy can be flexibly implemented using a single processor. We found that these tasks can be divided into 3 subtasks, which can be delegated to separate Worker (W) threads. In addition, we need a Manager (M) processor to be responsible for overall control of data processing. The Manager starts up the Workers and a Feeder (F), thereafter interacting with them only when needed. The Manager provides each Worker with a ‘bundle’ of information pertaining to the particular abstraction for which the Worker will be responsible.
The Feeder is responsible for getting data from a relational database. It passes on data to the Workers responsible for primitive parameters (i.e. the Workers that work directly with raw data values. As Workers get done with their subtasks (i.e., after they have completed their abstractions), they pass on the abstracted values to the next Worker in the Worker hierarchy. This system can be used in an interactive query mode as well as a bulk data processing mode.
To create a distributed version of the system, we specify one MasterManager (MM), which delegates tasks to distributed Managers. Using Java’s RMI, the individual Managers register with the MasterManager offering their services. The MasterManager then assigns tasks to these Managers using the RMI mechanism. The MasterManager monitors and reallocates tasks as necessary in the face of Manager failure or increased availability of Managers. The distributed architecture is shown in the figure.

RESULTS AND DISCUSSION
We evaluated the algorithm by comparing a setup with two Managers running locally on a single machine and then on different CPUs. The test was done using 60,720 sets of HIV viral load values to abstract drug-resistance patterns [3]. The difference between the mean cost between the stand-alone and distributed versions for 3 runs was negligible (13.4 vs. 13.5s), which included the overhead of distributed computing caused by network communication.
Our algorithm can be flexibly deployed as a single standalone process, or it can be distributed among multiple processes on one or more machines. Because the distributed algorithm allows independent evaluation of each abstraction in the abstraction hierarchy, it can undertake temporal abstractions of large databases efficiently by using separate processes for each portion of an abstraction tree.
References
- 1.O’Connor MJ, Grosso WE, Tu SW, Musen MA. Medinfo-2001. Pt 1. Vol. 10. London, U.K: 2001. RASTA: A distributed temporal abstraction system to facilitate knowledge-driven monitoring of clinical databases; pp. 508–12. [PubMed] [Google Scholar]
- 2.Shahar Y, Musen MA. Knowledge-based temporal abstraction in clinical domains. Artificial Intelligence in Medicine. 1996;8:267–98. doi: 10.1016/0933-3657(95)00036-4. [DOI] [PubMed] [Google Scholar]
- 3.Rafiq MI, O’Connor MJ, Das AK. Computational method for temporal pattern discovery in biomedical genomic databases. Proceedings of 2005 IEEE Computational Systems Bioinformatics Conference; Stanford, CA. 2005. pp. 362–5. [DOI] [PubMed] [Google Scholar]
