Apollo: Giving application developers a single point of access to public health models using structured vocabularies and Web services

Michael M Wagner; John D Levander; Shawn Brown; William R Hogan; Nicholas Millett; Josh Hanna

. 2013 Nov 16;2013:1415–1424.

Apollo: Giving application developers a single point of access to public health models using structured vocabularies and Web services

Michael M Wagner ¹, John D Levander ¹, Shawn Brown ², William R Hogan ³, Nicholas Millett ¹, Josh Hanna ³

PMCID: PMC3900155 PMID: 24551417

Abstract

This paper describes the Apollo Web Services and Apollo-SV, its related ontology. The Apollo Web Services give an end-user application a single point of access to multiple epidemic simulators. An end user can specify an analytic problem—which we define as a configuration and a query of results—exactly once and submit it to multiple epidemic simulators. The end user represents the analytic problem using a standard syntax and vocabulary, not the native languages of the simulators. We have demonstrated the feasibility of this design by implementing a set of Apollo services that provide access to two epidemic simulators and two visualizer services.

Introduction

The goal of the Apollo project is to create a standard way for programs to access epidemic simulators and thus increase the accessibility, ease of use, and utility of epidemic simulators for research and public health practice.

Our approach is to develop an ontology, called Apollo-SV (Structured Vocabulary), for the domain of epidemic simulation, proceeding through a series of releases (versions) of the ontology with increasing coverage of diseases and control measures. The ontology provides a standard vocabulary and set of definitions, to which we add standard syntaxes for representing simulator configuration and simulation results. The Apollo Web Services combines these elements into an operational system that end-user applications can use to find and run epidemic simulators.

Background

An epidemic simulator is an algorithm that takes as input the current disease state of a population and optionally a description of disease control measures, and produces as output predictions of future disease states. The input is referred to as the simulator configuration.

Although there is heterogeneity among epidemic simulators, a dominant model at present is agent-based simulation. Agent-based simulation is attractive to analysts because it allows fine-grained socio-demographic and geographic stratification of a population. Agent-based simulation also make possible the simulation of disease control measures that work by increasing the social distance among individuals, such as school closure. A standard language for describing the configuration of simulators must be expressive enough to represent these stratifications.

Epidemic simulators are of increasing importance due to bioterrorism and the threat of emerging diseases. They were integral to the U.S. response to the 2009 H1N1 pandemic, when groups such as the NIGMS Modeling Infectious Disease Agent Studies (MIDAS) research network were called into action to provide operational modeling for the Department of Health and Human Services, the Centers for Disease Control, and the Department of Homeland Security.¹ During the pandemic, decision makers worried whether the new H1N1 vaccine would be available in time¹^,² and therefore considered closing schools,³^,⁴ prioritizing vaccination to certain groups, and using adjuvants to increase the vaccine supply. To inform these decisions, analysts ran thousands of epidemic models of these disease control measures under different assumptions about the expected outbreak’s timing, reproductive rate, incubation period, case severity, and other characteristics.⁵^,⁶

However, the 2009 H1N1 experience identified a limitation of existing epidemic simulators, which this project addresses: To use them, a great deal of time and effort must be spent translating possible outbreak scenarios and control measures into the non-standard languages for configuration and results used by different epidemic models. The terminology and syntax of the configuration files were all unique; thus, it was difficult to know whether two models were modeling the same complex scenario. The problem was compounded by the need to explore many scenarios. The policy exploration was iterative, with each evolution having a turnaround time of many hours to a day. Without standardization, the use of epidemic simulation to guide response in practice will continue to be labor intensive and error prone.

Methods

We created a standard for epidemic simulator configuration and output and a set of Web services that use this standard to enable programmatic access to simulators. The standard comprises the Apollo-SV ontology, a vocabulary defined by the ontology, a message syntax, and set of well-defined programmatic interfaces (APIs).

Apollo-SV

Apollo-SV is an application ontology; it supports applications that find, configure and run epidemic simulators. Apollo-SV represents entities referenced in epidemic simulator configuration files and outputs, such as disease control strategies, vaccination efficacy, and fraction of population immune.

We developed Apollo-SV through an iterative process that began with analysis of the terms used in the configuration and output files of existing epidemic models including the FRED agent-based model and a SEIR model developed at the University of Pittsburgh. We chose an initial set of terms that would allow us to perform basic configuration of the epidemic simulators and understand enough output to plot epidemic curves and draw maps using visualizer services we discuss below. The next step was creating an entry in a white paper for each entity to which the terms in these files refer (Box 1). Each entry included the original term(s) (for tracking purposes), a disambiguated standard term, a unique term for use in Apollo Web Services (called the Unique Apollo Label), a formal ontological textual definition, and an elucidation that recasts this definition in language more familiar to subject matter experts. The textual definition is a precise, formal-ontological designation of the entities to which the class refers. The elucidation for each class helps the end-user select the terms she needs to configure epidemic models exactly according to her intention.

Box 1. White paper entry.

URI: http://purl.obolibrary.org/obo/APOLLO_SV_00000016

Unique Apollo Label: infectious period

Label: duration of infectiousness measurement datum

Definition: The measurement datum for the duration of the parts of an infectious disease course during which the host bears an infectious disposition in a population of hosts.

Elucidation: The duration of the infectiousness of infectious individuals in a population expressed in time step units, for example “6.1”

The subject matter experts and ontologists on the team iterated over the included terms, Unique Apollo Labels, textual definitions, and elucidations until the white paper reached stability. At that point, we created Apollo-SV as a Web Ontology Language (or OWL) artifact, building each entry in the white paper as a class in the ontology. We annotated each class with the disambiguated term, Unique Apollo Label, textual definition, and elucidation from the white paper. We also constructed logical definitions of each class from the white paper as description logic (DL) axioms using the DL supported by OWL 2.0. In the process, we imported pre-existing classes from the Ontology for General Medical Science, Infectious Disease Ontology, Phenotypic Quality Ontology, Ontology for Biomedical Investigations, Information Artifact Ontology, and the Ontology of Medically Related Social Entities. We used the MIREOT Protégé plugin described by Hanna et al.⁷ to carry out the import process.

Vocabulary

The vocabulary used in the Apollo Web Services comprises the following 42 Unique Apollo Labels:

Software identification	Reproduction number	Reactive control measure
Software developer	Asymptomatic infection fraction	Vaccination control measure
Software name	Simulated population	Vaccine supply schedule
Software version	Population location	Vaccination administration schedule
Requester ID	Susceptible	Vaccination control measure compliance
Run ID	Exposed	Vaccination efficacy
Simulator time specification	Infectious	Vaccination efficacy delay
Time step	Recovered	Antiviral control measure
Time step unit	Symptomatic	Antiviral efficacy
Time step value	Asymptomatic	Antiviral efficacy delay
Run length	Awaiting control measure	Antiviral control measure compliance
Disease	Not awaiting control measure	Antiviral supply schedule
Infectious period	Received control measure	Antiviral administration schedule
Latent period	Awaiting effective control measure	Time series

Open in a new tab

Syntax

We use an XML Schema Definition (XSD) file to represent the simulator configuration. XSD is a W3C-recommended language used to define sets of rules to which XML files must conform in order to be considered valid.

The SimulatorConfiguration data type is defined compositionally by six data types that specify (1) the simulator; (2) a user’s authentication credentials; (3) the temporal granularity and run length of a simulation; (4) the simulated population and its initial disease state; (5) the infectious disease, and (6) control measures (Figure 1).

Figure 1. — The *SimulatorConfiguration* and related types in the Apollo Web Services

The standard terms in the XSD file are represented by compacted versions of their Apollo Unique Labels. The compacted versions eliminate white spaces and capitalize the first letter after a deleted white space.

SQL

Figure 2 shows the schema of the results database. The database schema has two principle entities—simulated population in which each record represents a stratum (i.e., a spatial/sociodemographic subpopulation) of a simulated population; and time series, in which each record represents the counts of individuals in a stratum at one time step of a simulation.

The stratification of a simulated population is represented by population characteristics, which are specified as orthogonal axes such as gender, age-range, disease status, and location. The axes take values such as male and female or INCITS (formerly known as FIPS) location codes. This schema is designed to accommodate whatever axes and values a simulator requires. The time series entity represents the counts of each simulated population for each time step of a simulation.

Figure 3 shows example output for a simulation that has just four simulated populations—those individuals who are susceptible, exposed, infectious, and recovered in Allegheny County (INCITS 42003). The time series table shows how the counts for these four simulated populations changed from time step 1 to time step 2.

The terminology for the axes and values in the population characteristics table is defined by the Apollo-SV ontology. The ontology ensures that values for a specific axis are disjoint.

APIs of the Apollo Web Services

The Apollo Web Services at present comprise four types of Web service, each of which defines a programmatic interface for a class of applications such as epidemic simulators, visualizers, and programs that generate synthetic populations. The four service types are the Apollo Service, Simulator Service, Visualizer Service, and Synthetic Population Service.

We implemented these services using the SOAP protocol, which is an XML-based protocol that enables applications to exchange data over the Internet. This protocol, which most commonly transmits messages using the widely supported Hypertext Transfer Protocol (HTTP), is both platform and language independent.

Figure 4 shows the two types of services that an end-user application would use to configure and run an epidemic simulator. An end-user application communicates directly only with the Apollo Service, which mainly functions to route service requests to other services. To run a simulation, an end-user application invokes the runSimulation method of the Apollo Service with a simulator configuration object as parameter. The Apollo Service then invokes the run method of the Simulator Service, here the FRED Simulator Service, with the simulator configuration object. The Simulator Service translates the simulator configuration information transmitted as a parameter with the SOAP request to the native vocabulary and syntax of the simulator. It then starts the simulator and returns a run identifier.

When the simulator has completed the run, it writes its output in standard format to a results database. At present, our group maintains a single results database for the two simulators that are connected to Apollo. However, the architecture is flexible and each Simulator Service can maintain a results database for its own results.

The end-user application invokes the Apollo Service’s getRunStatus method with a run identifier to determine whether the simulator has completed the run.

The Apollo Service also includes a getRegisteredServices method that returns a list of available Apollo Web Services. An end-user application or other client of the Apollo Service uses the getRegisteredServices method to find services.

A developer of an end-user application connects by “consuming” the WSDL of the Apollo Service. A WSDL, which stands for Web Service Description Language, is an XML format that defines the methods and message syntax of a web service.

The specifics of how the developer of an end-user application consumes a WSDL depend on the programming language in which the application is being developed, but in general the process is easy due to the many tools available that automate the generation of the requisite code for a developer. For example, Java programmers often use the “WSDL2Java” tool (included in both the Apache CXF and Apache Axis java library).

Once the Apollo Service WSDL is consumed, the developer has access within his programming environment to the following Apollo Service methods: getRegisteredServices, runSimulation, runVisualization, and getRunStatus; and to the following Apollo Service data types: the configuration object and all its related classes.

Developing a Simulator Service

To make it easier for developers of epidemic simulators to create Simulator Services, we offer skeleton implementations of a Simulator Service in Python and Java. Using the Python implementation as an example, the simulator developer would download the Simulator Service skeleton and then complete the method stubs in SimulatorService.py for “soap_run” and “soap_getRunStatus”

Additionally, the simulator developer must modify her epidemic simulator to write results to the Apollo results database and register her simulator with Apollo.

At this point, we have described the components of the Apollo Web Services that a developer of a simulator needs to know about when developing a Simulator Service for his or her simulator. We next describe how an end-user application queries the results database.

Results Retrieval and Visualization

At present, the Apollo Web Services support results retrieval with a third service type, called the Visualizer Service. Visualizer Services create graphs and maps that displays simulator results. Figure 5 shows a Visualizer Service for the GAIA visualizer. GAIA is a program that takes as input the results of a simulation in text file form and outputs maps and movies of maps.

Figure 5. — The Apollo Service and a Visualizer Service for the GAIA visualizer, which generates maps and videos of disease spread

When using a Visualizer Service, an end-user application invokes the Apollo Service’s runVisualizer method with an SQL query and a simulator run ID. In our current implementation, the visualizer (e.g., GAIA) obtains the result data by directly querying the results database. The Visualizer Service then returns a visualization run ID and a URL where the visualizer will write its output (a video, for example).

Just like when running a simulator, the end-user application uses the getRunStatus method to poll the Apollo Service to determine when the visualization is complete. When the job is completed, the end-user application downloads the visualization from the URL.

The fourth type of service is the Synthetic Population service. A synthetic population is a set of synthetic individuals for use in an agent-based simulator. A Synthetic Population service retrieves a set of synthetic individuals for a given location who, in the aggregate, match key geographic and sociodemographic stratifications of the actual population such as age, gender, home, school and work locations.