Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2001 Mar-Apr;8(2):146–162. doi: 10.1136/jamia.2001.0080146

Requirements for Medical Modeling Languages

Arnoud AF van der Maas 1, Arthur HM Ter Hofstede 1, A Johannes Ten Hoopen 1
PMCID: PMC134554  PMID: 11230383

Abstract

Objective: The development of tailor-made domain-specific modeling languages is sometimes desirable in medical informatics. Naturally, the development of such languages should be guided. The purpose of this article is to introduce a set of requirements for such languages and show their application in analyzing and comparing existing modeling languages.

Design: The requirements arise from the practical experience of the authors and others in the development of modeling languages in both general informatics and medical informatics. The requirements initially emerged from the analysis of information modeling techniques. The requirements are designed to be orthogonal, i.e., one requirement can be violated without violation of the others.

Results: The proposed requirements for any modeling language are that it be “formal” with regard to syntax and semantics, “conceptual,” “expressive,” “comprehensible,” “suitable,” and “executable.” The requirements are illustrated using both the medical logic modules of the Arden Syntax as a running example and selected examples from other modeling languages.

Conclusion: Activity diagrams of the Unified Modeling Language, task structures for work flows, and Petri nets are discussed with regard to the list of requirements, and various tradeoffs are thus made explicit. It is concluded that this set of requirements has the potential to play a vital role in both the evaluation of existing domain-specific languages and the development of new ones.


Software engineering practice in contemporary health care involves selecting existing methods and technology and applying them correctly. Examples include the use of object orientation,1 component-based development,2 workflow modeling,3 conceptual graphs,4 graph grammars,5 and decision networks.6 However, in the development of complex dedicated applications—such as configurable patient record systems for research purposes, patient-record-based reviewing systems, predictive data entry systems, care process monitoring systems, patient case simulators, and case report retrieval—existing technology may be insufficient or unsuitable in some cases. The improvement of existing methods and techniques or even the invention of new specialized ones then becomes necessary.

A domain-specific modeling language is a language designed specifically for solving certain types of problems in a restricted, predetermined domain. Typically, this means that concepts occurring in the domain are naturally and directly supported by concepts in the modeling language. Development of domain-specific modeling languages in the field of medicine happens only rarely. An example is the Arden Syntax, which is domain specific with respect to modeling valid alert tasks. Alert tasks are triggered in special situations and produce logical messages to suggest actions. Hence, the Arden medical logic modules (MLMs)7 use concepts such as “evoke,“ “logic,“ “action,“ “message,“ and “literature citation“ (see Figure 2).

Figure 2.

Figure 2

An Arden medical logic module (MLM) with all optional slots.7

Having recognized the need for the development of domain-specific languages, it is important to establish requirements by which such development should be guided.

The requirements presented in this paper emerged initially from the analysis and development of information modeling techniques8 such as PSM9 and LISA-D,10 and, in a medical informatics context, the development of PCRL11 and DCGL12 led to their present formulation and application. These requirements will be presented, illustrated, and justified in this paper. Arden MLMs, with which many readers are familiar, are used as a running example, and several examples from other, more advanced modeling languages are provided as well.

In brief, these requirements can be summarized as follows: Domain-specific languages should be conceptual, to avoid the specification of domain-irrelevant details; formal, to avoid ambiguity and allow for sophisticated automated support; expressive, to fully capture a problem; suitable, to capture a problem conveniently; and comprehensible, so that the models can be communicated to non-computer experts. Insight into the meaning of a model might be improved by its execution. Therefore, domain-specific models also have to be executable.

These criteria are discussed in detail in the following sections. The paper concludes with an example in which the requirements are applied in a software-engineering context in health care, to illustrate the principles in their assessment of existing modeling languages, on the one hand, and to give some insight into possible improvements inspired by the principles, on the other.

Formal

Development of formal modeling languages is very complicated and typically requires substantial experience and mathematical maturity in the developer. It is also a process that is usually not addressed in university curricula, and many formalization experts are basically self-taught.

The need for formal foundations of modeling languages has often been addressed in the computer science literature.1319 The main reason to formalize is to avoid ambiguity. This is typically necessary when models are to be communicated, especially with computers.

Ambiguity typically occurs when natural language is used, even in such a seemingly formal way as in structured English.20 Graphs, too, can be notoriously ambiguous. Even very simple graphs—like the graph shown in Figure 1, which is taken from a methodology book for medical researchers21—can be interpreted in different ways. When interpreting Figure 1, for example, it may be concluded that a “disease” is an activity, which uses a “diagnosis” and an “etiognosis” to produce a “prognosis.” This interpretation is, of course, nonsense, but the fact that it is nonsense could not be derived from the picture itself. Hence, Figure 1 without formalization serves as an illustration rather than a knowledge model. In the field of artificial intelligence, the IS-A link, employed in many medical knowledge graphs, is a typical source of ambiguity. Nearly every type of semantic network assigns a different (often only intuitive) meaning to this type of link.22

Figure 1.

Figure 1

An ambiguous graph.

Once models have been computerized, tools can provide sophisticated additional support to modelers. Automated tools are able to perform syntactic and semantic checks (verification) or simulate complex behavior (validation). For example, simulation of disease course models enhances the understanding of the meaning of the disease course specification and its implications considerably.

A modeling language is formal if and only if it has a precise definition of its syntax and semantics. Syntax is needed to define what well-formed models are, i.e., what models conform to the rules of the language, while semantics is needed to assign a meaning to well-formed models.19 We will first focus on syntax before concentrating on the even more important semantics. In particular, we will start with contrasting concrete and abstract syntax and argue why the latter is preferred.

Syntax

When defining the syntax of a modeling language, it is customary to use Backus Naur form (BNF).23 The BNF definition of radiology finding,24 for example, looks like this:

Radiology-finding ::= has observation: observation;

has location: body-location*;

has location qualifier: location-qualifier*;

has presence: certainty*;

has degree: degree*;

has temporal: temporal*;

has quantity: quantity*;

has property: property*

An asterisk (*) indicates that the component is optional, and the semicolon (;) separates components. The radiology definition could also look like this:

Radiology-finding ::= an amount of [quantity][degree][observation](s)

with property [property] occurring [temporal]

is observed at [location-qualifier] of [body-location]

with a certainty of [certainty]

In the latter definition, an explicit choice for keywords (in italics) is made as well as an explicit choice for the order of the various components of the radiology finding. The latter definition obfuscates the deeper, underlying structure of what a radiology finding is. Different users may prefer different keywords and different orders of the various parts. The underlying structure, whether formulated in English or in French, remains essentially the same.

An abstract syntax definition, like the first one, simply gives the components of a language construct and omits representational details. Although concrete syntax is important and necessary for communication, it introduces nonessential elements. The abstract syntax can be compared with what is referred to as the conceptual level in the classical three-level architecture of databases.25 The conceptual level focuses on concepts as they are, not on how they are perceived (external level) or implemented (internal level). For example, the abstract syntax of bibliography reference styles, such as the Vancouver, Harvard, and Chicago styles, will be exactly the same. This abstract syntax serves as the basis for automated bibliography reference managers, such as BIBTEX in LaTeX.26 Meyer23 stresses that:

The use of abstract syntax rather than concrete syntax as a basis for studies of programming languages is representative of an important trend in software engineering: the move towards a higher-level view of software objects, emphasizing deep structure rather than surface properties. Concepts such as abstract data types are another example of this trend.

Consequently, questions as to whether a care task should be represented as a square or a circle are conceptually irrelevant. Often discussions are blurred as a result of mixing representation with essence. Discussions should shift from these uninteresting representational aspects to the semantic issues, which are of utmost importance.

It is also important not to get carried away by the imposition of syntactic restrictions to exclude models that may at first seem undesirable. The main reason for this is that in the early stages of formalization, it is hard to characterize “undesirable” models and provide a clear justification for this. Situations that seem undesirable at first often do make sense from a semantic point of view and in some cases even add flexibility and expressive power (see, for example, “formalization principles”27).

Semantics

Having defined an abstract syntax for a certain modeling language, the next, even more important step is the definition of semantics, which should ultimately be machine processible. A formal semantics is needed when, for example, a case history is to be matched with a disease course by a computer, since this requires a very precise understanding of what a disease and a case history are. Often, however, defining syntax and defining semantics are not strictly sequential activities. A choice for a certain semantics may lead to improvement of the abstract syntax, i.e., to an abstract syntax that leads to more elegant formulations (but this cannot always be foreseen).

There are many styles of assigning formal semantics. In medical informatics, as an applied science, the definitions of operational and translational semantics are particularly relevant (the following are taken from Meyer23).

  • Translational semantics: In translational semantics, models specified in a certain modeling language are given a semantics by the definition of a mapping to models of a simpler language, a language that is better understood.

  • Operational semantics: If a translational semantics amounts to a compiler for a modeling language, an operational semantics is like an interpreter. The idea is to express the semantics of a modeling language by providing a mechanism that makes it possible to determine the effect of any model specified in the technique. Such a mechanism can be seen as an “interpreting automaton.”

Both styles of assigning semantics are illustrated in the Executable section. Operational semantics, because of its interpreting character, is less suitable when comparing properties with other modeling languages. However, an operational semantics is perfectly acceptable if the primary aim is the development of computer support tools.

MLM Example 1 (Formal)

Medical logic modules have a formal syntax, but have no formal semantics. There is no mathematical model that defines what MLM specifications mean. There is no formal relation between the intuitive explanation in the MLM explanation slot and its implementation in the MLM logic slot (MLM slots are shown in Figure 2; in the next section, under Conceptual, we will show that contents of the MLM logic slot is not conceptual.)

The following MLM logic example28 calculates an anion gap:

  • anion_gap := sodium – {chloride + bicarbonate};

  • anion_gap2 := int(0.5 + anion gap)

Formal interpretation of this MLM logic requires knowledge of procedural structures such as assignment (:=), and sequence (;) as well as knowledge about the use of an auxiliary variable (anion_gap2) and an auxiliary integer function (int). Hence, a possible way to assign a formal semantics would be to define a mapping from MLM specifications to an abstract machine.

Examples of domain-specific modeling languages with a formal syntax are Arden,7 MCRL,29 TSMI,30 and PCRL.11 The following modeling languages have a formal semantics as well: GRAIL,31 KL-ONE32 (for a more recent reference on the formal basis of, for example, snomed, see Campbell et al.33), Graph Grammar,5 decision networks,34 and DCGL.12

Conceptual

In his prologue to Foundations of Computing,35 Scheurer argues that the essence of computing might be summed up in the word generality. Ultimately, computing has one fundamental objective—to study and to achieve generality. Programs that have been set up in a general way are more easily adaptable to new requirements,36 and specifications that are general are easier to understand than those laced with details and choices that are too specific.

The notion of generality is closely related to the conceptualization principle,25 which states that conceptual models should deal only and exclusively with aspects of the problem domain. Any aspects irrelevant to that meaning should be avoided, since this might lead to exclusions of possible solutions at too early a stage. Furthermore, they tend to lead to models that are difficult to comprehend and to communicate by domain experts, and they also easily become outdated. Examples of these conceptually irrelevant aspects are previously mentioned aspects of external-level or internal-level data representation, such as physical data organization and access, as well as message formats and data structures.25

As an illustration of the importance of conceptuality, consider conceptual graphs.37 These graphs are not designed to represent complex nested entities. Suppose a conceptual graph on physiology contains the concept “citric acid cycle.” The analyst then cannot decompose the citric acid cycle into its components without flattening the decomposition (Figure 3). Several alternative ways of flattening are possible, so the modeler has to choose. Such a choice leads to overspecification. For example, the flattening of Figure 3 introduces composition relations, classification relations, renaming, and numbering, which is arbitrary and, hence, not relevant from a conceptual point of view.

Figure 3.

Figure 3

Flattening introduces part-of relations (dashed), kind-of relations (dotted), renaming, and numbering.

The requirement of being conceptual is also applicable in data-intensive domains. Data models such as that represented in Figure 4 are currently used in the design of “code books,” i.e., blueprints for research databases.21 These data models suffer from overspecification. The model shown in Figure 4, for example, contains byte position specifications and position- dependent codes such as AAA and BB. Also, the model hides conceptual hierarchic structures.

Figure 4.

Figure 4

A data model as used in code books.

The conceptual counterpart model is shown in Figure 5. This model focuses on abstract entity types (ovals), their mutual roles (squares), their classification (arrows), their identification (between brackets), their population (superscript between set parenthesis), and their constraints. All other information is left out, since it is not relevant from a conceptual point of view.

Figure 5.

Figure 5

The conceptual counterpart of the data model shown in Figure 4.

Constraints can be formulated graphically as well as textually. The following LISA-D10 expression is an example of a formal textual constraint applicable to the model in Figure 5:

FOR EACH x IN Married participant HOLDS Age of x >= 18

MLM Example 2 (Conceptual)

The contents of MLM logic slots (Figure 2) are not logic expressions as used in mathematics; rather, they are Boolean functions with so-called side effects. Hence, interpretation of MLM logic in the previous section (see Example 1) requires knowledge of procedural structures such as assignment (:=), and sequence (;), as well as knowledge about use of auxiliary variables (anion_gap2) and auxiliary integer functions (int). All these aspects are implementation details and would have been different if, for example, a functional language were used. Therefore, we cannot reason about MLMs without knowing implementation details, which makes complex MLMs not conceptual. This aspect restricts validation of complex MLMs on a conceptual level. Fortunately, MLMs can be executed (see discussion under Executable); hence, behavior of the Boolean functions can be tested thoroughly.

Expressive

The 100 percent principle25 states that a conceptual model should describe all relevant static and dynamic aspects of the domain of interest. A modeling language should therefore cover the essential concepts of an application domain. This implies that a modeling language should have sufficient expressive power.

As an illustration of this requirement, consider conventional temporal logic. This type of logic is not sufficient to model case reports because it cannot express patterns such as “nightly headaches” or deal with incomplete knowledge, such as “headache and nausea co-exist.”11 The latter expression, when present in a case report, should not be interpreted as if headache and nausea exist at the same time. It more likely means that “headache and nausea both exist and have a strong unspecified temporal relationship, strongly suggesting that a common underlying process exists.”

MLM Example 3 (Expressive)

Originally the Arden MLM messages were text strings and, hence, not configurable. Now, MLM messages are configurable and can express, for example, calculated drug doses.28 In an example,28 the following configurable message is used to express the patient's hematocrit level (where “hematocrit”' between brackets is a variable):

The patient's hematocrit (“||hematocrit||”) is low or falling rapidly.

The possibility of programming Boolean functions makes MLMs very expressive. If mathematical logic would have been used, the detection of paths would not have been possible, since MLMs cannot be chained. Detection of paths is essential when reasoning about processes. Medical logic modules are almost as expressive as third-generation programming languages. However, when compared with programming languages, MLMs still are more restricted. For example, MLMs cannot produce charts, since output is restricted to textual messages.

Lack of expressive power becomes a true problem when a modeler is forced to oversimplify the domain of interest.

Comprehensible

Since one of the important roles of conceptual models is to establish a common understanding of the domain of interest (especially between domain experts), it is vital that a conceptual model be comprehensible. Engels et al.38 stress that languages for conceptual modeling should be easy to use and easy to learn. Groenboom et al.39 use the following axiom, which is a Z-specification (where “tp” is a time-point):

AXIOM Not (PerfusionPeriod Contains tp) => Valuation(LowBodyTemp).tp = Unacceptable

This axiom expresses that the phenomenon “low body temperature” is unacceptable in situations other then “perfusion.” This Z-specification fulfills the requirement of being formal, since Z has a well-defined formal semantics.14 Although the axiom specification is rather cryptic, it is more or less comprehensible. However, things get less comprehensible when, for example, it is defined that: “lung edema or atelectasis cause lung diffusion to decrease”:

AXIOM EveryWhere(CauseLink((LungEdema Param2 [Or] Atelectasis), MakePhen((Course(LungDiffusion,world).tp) =< Decreased), Obligatory)

Even more problems occur when a set of AXIOMs is to be interpreted.

A domain expert will hardly recognize the presence of chains, essential in causal models, and cycles will be detected only if thoroughly sought. As long as disease courses are to be modeled like this, medical domain experts will not build automated disease course libraries themselves.

Comprehensibility can be achieved by offering structuring mechanisms, graphical notations, and by the provision of an intuitive semantics.

By offering structuring mechanisms a specification remains surveyable. An often-used structuring mechanism is decomposition. In Figure 3 for example, removal of decomposition introduces overspecification, which violates conceptuality as well as comprehensibility.

Comprehensibility can also often be improved by the use of graphical notations. Harel40 emphasizes the importance of visual formalisms:

Visual, because they are to be generated, comprehended, and communicated by humans, and formal, because they are to be manipulated, maintained, and analyzed by computers.

Several reasons can be given why graphic representations are more comprehensible than their textual counterparts41:

  • Graphic representations are in two dimensions, whereas text is in one dimension. The former gives an additional degree of freedom in presentation.

  • Graphic representations are more useful in showing the intrinsic structure of complex systems and more natural in describing parallelism.

  • Graphic representations can be read in a selective way, depending on the level of detail required. Text is to be scanned linearly.

  • There is a limit on the number of concepts that can be reasonably held in the short-term memory of the human mind.42 A person reading graphics can start off generally and go down to detail after some degree of familiarization. With text, the reader has to start off with detail and abstract the skeleton concepts while reading.

These reasons are illustrated with a formal textual case report (the informal original is taken from the New England Journal of Medicine44) in Figure 6 and its graphic counterpart in Figure 7.

Figure 6.

Figure 6

A formal PCRL11 case report text.

Figure 7.

Figure 7

Case report graph of case described in Figure 6, after automated conversion.43

Every event or episode described in the left-most column of Figure 7 has, in principle, its own vertical time axis. By the use of a formally defined “horizontal” compression technique,43 several phenomena from the left column (e.g., events that occur strictly at different times, in non-overlapping intervals) can be presented on the same axis. This results, even for large case reports, in manageable and comprehensible charts (for anyone who has learned the simple compression rules).

Finally, model semantics should be in a style close to intuition. For example, specifications that syntactically resemble natural language may be close to the human intuition and hence comprehensible. An important prerequisite is then, of course, that the associated formal semantics is also close to this intuition. This does not mean that syntax or semantics itself should be intuitively or informally defined!

MLM Example 4 (Comprehensible)

The basic structure of MLMs is easy to comprehend, as most slots, shown in Figure 2, are based on “rational event monitoring”concepts mentioned earlier. Also, the fact that MLMs are self-contained facilitates interpretation. Messages are easily interpreted and formulated. The contents of the MLM logic slot are less easy to understand and formulate. When writing MLMs, knowledge of Pascal or Basic is recommended.7 Therefore, MLMs will be comprehensible for those familiar with modeling procedural knowledge.

Suitable

A modeling language should use concepts that closely resemble the concepts of the domain of interest. For example, conventional temporal logic is not suitable to model disease courses, since they do not support disease course concepts, such as course remissions and asymptomatic courses. For example, Figure 8 shows a DCGL12 disease course model, and Figure 9 shows what the disease course model would look like if DCGL did not have intrinsic course remission semantics. Notice that updating the model shown in Figure 9 with a new disease phenomenon requires all course remission state descriptions to be changed.

Figure 8.

Figure 8

A DCGL disease course graph.12

Figure 9.

Figure 9

The DCGL disease course graph shown in Figure 8 without course remission and asymptomatic semantics.

Although suitability is, to some extent, a subjective notion, a strong link between the concepts offered by the modeling language and the concepts required by the problem domain is clearly desirable. This is illustrated by another example. Conceptual graphs as described by Sowa37 are, for example, not suitable to model the dynamics of disease courses, as they are not really suitable to deal with state transitions. On the other hand, conceptual graphs are suitable when, for example, disease concepts are to be typified or defined in terms of other concepts, as in classification models.

Suitable modeling languages can be expressive without the use of a large number of concepts. The use of a small number of essential concepts is preferred, to keep semantics specification concise. Hence, an important part of modeling language development should be bottom up and should start with a base of suitable syntactic elements inspired by essential concepts of the domain under consideration. This bottom-up principle derives from sound engineering practice as well as from structured programming.

MLM Example 5 (Suitable)

The data-driven messaging as well as the use of previously mentioned event concepts such as “evoke,” “logic,” “action,” “message,” and “literature citation” make MLMs very suitable for modeling event interrupts. However, the fact that MLM logic is not conceptual introduces the need for an explanation slot. Introduction of an informal explanation slot introduces the problem of keeping this slot consistent with the logic slot during the lifetime of an MLM. Also, the use of an informal explanation slot can produce a false appearance of the MLM, since there is no formal relation with the implementation. The introduction of a formal and conceptual semantics would allow explanation texts to be derivable as well as meaningful.

Executable

Validating requirements as early as possible might prevent errors in later stages of system development. The later an error is detected, the more it will cost to correct it. To make validation possible in the early phases of system development, it is important that a specification can be executed. This enhances the understanding of the meaning of a specification and its implications considerably. Therefore, a modeling language is, preferably, executable. We consider a modeling language to be executable if all models it can express are executable. If execution of a model is not intended or even not obvious, it should still be kept in mind. For example, execution of a disease course model could be imagined as the systematic generation of case histories it covers. The generation of histories qualified as unlikely or impossible by domain experts could point to weaknesses in the model. Or, as another example, execution of classification models could be imagined as the systematic generation of implied class hierarchies.

Execution of a disease course requires that the formalism have a notion of “flows of symptoms.” During execution, when using workflow-based45 operational semantics are used (discussed earlier, under Formal), the disease course graph shown in Figure 8 would look like that shown in Figure 10.

Figure 10.

Figure 10

Disease course graph12 during interpretative execution.

Classification models can also be executed. This counter-intuitive idea will be illustrated with the GRAIL formalism.31 GRAIL is a classification language that supports concept definition and classification. In GRAIL, concepts can be composed using a WHICH-operator. The concept “fracture of the greater trochanter,” for example, is modeled as “fracture WHICH has location greater trochanter.” GRAIL semantics implies that this composed concept is a kind of fracture, i.e., that this composed concept is considered a subclass of “fracture.” This GRAIL semantics is translated (see under Formal) to a relation-oriented inference structure formalism.

Such network formalism is a suitable design to infer subclass hierarchies and attribute inheritance (Figure 11, where the part-of relation is dashed, the has-location relations are dotted, and the subtype relations are solid). Because of this suitable translation, GRAIL can easily execute classification models with thousands of concepts.46

Figure 11.

Figure 11

GRAIL execution through translation.

In this context of the executability requirement, some additional remarks on GRAIL should be made. The ability to execute a model or specification is important, as argued, but it is also important that the performance of computerized tools stays at a reasonable level with the growth of the models, i.e., that computerized verification work is always limited and tractable.47 GRAIL is very expressive, because its foundations are full first-order logic. However, this has its drawbacks with regard to comprehensibility to a certain extent, and also, in principle, to the more practical aspects of “scalability” of models. Up to now, the addition of a (complicated) concept to existing GRAIL models appeared feasible and efficient (because of the tools used and the way models are structured), but the underlying first-order logic can potentially result in an intractable situation, i.e., that the amount of effort needed to extend model verification is not predictable.

MLM Example 6 (Executable)

Medical logic modules automate interrupt messages during delivery of care; hence, MLMs are executable. Next to this computer-assisted decision support, MLM executability also supports MLM validation, since messaging behavior can be simulated on the computer and tested thoroughly.

Application: Example

In this section, the requirements as proposed in this paper will be applied to health care in a software engineering context. This exercise will show how the requirements may be applied to the assessment of existing modeling languages and how they may provide directions for extensions. First we consider activity diagrams, as used in the Unified Modeling Language (UML),48,49 since UML has been adopted by Health Level 7 as the standard in health care.*

Suppose that an electronic patient dossier is to support care coordination in palliative care. Suppose, too, that the patient dossier is to be able to process care plan information such as that presented in example 7:

Example 7

Patient Mr. Smith is admitted to the hospital. He previously received, in another general hospital, a standard oncology treatment, which was not successful. Currently, he is included in a clinical drug trial, the only possibility that resulted from an analysis made by Dr. Nolan, a specialist in (experimental) oncology.

Mr. Smith is not responding well to the experimental treatment. For Mr. Smith now, a minimal chance of cure no longer compensates for the burden of participating in a trial. The care team considers stopping treatment of the primary tumor with the main objective of cure. Dr. Nolan will discuss this subject with Mr. Smith in the next consultation.

All staff members of the oncology department are to be informed about this serious turning point in care objectives and about the opinion of Mr. Smith and the way he copes. Dr. Williams, the general practitioner, will also be informed. Dr. Nolan will involve Dr. Williams even more in the current care delivery by scheduling a joint hospital visit with Mr. Smith and his wife. Before that, Dr. Williams will contact Mrs. Smith at home.

Eventually, case management will be handed back to Dr. Williams (this is a local departmental policy). However, this will be in the long term rather than the short term, because Mr. Smith prefers (for the time being) to be counseled by Dr. Nolan. In the short term, focus will be on minimizing changes in care delivery.

Example 7 can be characterized as a description of flow of control in an individual care process. The care plan information in this example clearly is not about complex objects using complex data structures. Hence, we will not model the flow of control from object to object. Suitable modeling of this care plan information can be realized using a process control perspective; i.e., modeling a flow of control from activity to activity is preferred. The process control perspective makes it possible to focus on the care process, allowing care team members to monitor the progress of care.

In UML, process control flow is captured by means of activity diagrams.48 Activity diagrams are fully independent of implementation and allow a focus on process coordination. As such they are conceptual. They use concepts typical for workflow modeling and hence can be considered suitable for modeling care coordination. In addition, activity diagrams are easy to understand and easy to learn and can therefore be considered comprehensible. The introduction of Paradigm Plus allows activity diagrams to be executed. However, activity diagrams do lack a formal semantics, and as such their semantics is prone to ambiguities.

A simple semantic problem is illustrated in the partial activity diagram shown in Figure 12. This diagram comprises two synchronization bars (represented by horizontal lines) and five activities (represented by rounded rectangles). In activity diagrams, it is not clear whether the synchronization bar has the semantics of a transition, as in Petri nets,50 or and-join semantics, as in task structures51 or typical workflow specification languages. To illustrate this difference, consider a scenario in which both activities A and B in Figure 12 are activated at some point in time but activity C will never be performed. If the synchronization bar has the same semantics as a transition in Petri nets, then activity D will be executed and no deadlock will occur. If however, the synchronization bar has and-join semantics, then activity D will also be performed, but a deadlock will occur as the right most synchronization bar will keep waiting for an occurrence of activity C.

Figure 12.

Figure 12

An ambiguous UML activity diagram.

Clearly, this is an important difference. However, Rumbaugh et al.52 remark, “The fork initiates concurrent activities that logically occur at the same time. Their actual execution may or may not overlap. The concurrency is terminated by a subsequent matching join.” Hence, it is not clear whether the problem depicted in Figure 12 can actually occur, since its occurrence depends on what a “matching join” exactly means (this requires a formal definition of the syntax).

The semantics of the synchronization bar is certainly not the only semantic problem with UML's activity diagrams. For example, activity diagrams are required to have a unique final node. When certain execution threads are still not finished, it is not clear what happens when this node is reached. Does the activity terminate, or does it terminate only when all threads have finished (“lazy termination policy”)?

With regard to expressiveness, activity diagrams lack the concept of a discriminator (a concept supported by the Verve workflow management system and a special case of the partial join presented by Casati et al.53). Depending on the semantics of the synchronization bar, this concept can or can not be expressed. Ter Hofstede and Kiepuszewski54 assigned a Petri net semantics to the discriminator. This semantics is inherently non-free-choice;55 hence, activity diagrams cannot capture the discriminator if the interpretation of the synchronization bar is that of the classical and-join.

Using a discriminator, the workflow structure of the care plan example can be modeled as in Figure 13. A discriminator (represented by an encircled D in the figure) is specified at the end of multiple parallel execution paths.

Figure 13.

Figure 13

Process model of a care plan using a discriminator (D).

The first execution path to finish starts the discriminator. All other execution paths then have to complete, but they do not start the discriminator again. Therefore, in the care plan described in example 7, Dr. Nolan will perform the task “schedule joint visit” either if “delay” has finished or if “call significant other” has finished, whichever happens first (see Figure 13). He will never perform “schedule joint visit” twice.

There are, however, other expressiveness problems with activity diagrams. As shown by Ter Hofstede and Orlowska,56 for example, recursive decomposition (e.g., an activity diagram that contains a reference to itself), which is not supported by activity diagrams, adds expressive power. Also, it is not possible to express that an unbounded number of instances of the same activity can be running in parallel (this is referred to as “multiple instances with no a priori runtime knowledge” in Van der Aalst et al.57).

Like activity diagrams, task structures58 are used to express workflows (in fact, the graphic notation used in Figure 13 is that of task structures). As opposed to activity diagrams, task structures do have a formal basis51; their semantics is expressed in the algebra of communicating processes.59 Task structures provide full support for multiple instances—the incorporation of the discriminator has been investigated formally in Edmond60 (although this did not take the form of an extension of the ACPtranslation)—and as recursive decomposition is allowed, all context-free languages can be expressed. Task structures are, thus, more expressive than activity diagrams. However, task structures do not have the full expressive power of Petri nets (since they do not explicitly support the notion of state, they have difficulties with state-based patterns; see the discussion in Van der Aalst et al.61). As a class of languages, Petri nets are the most expressive (although they cannot capture all context-free languages; see Peterson50 for example). However, it is well known that Petri net specifications easily become huge, complicated, and unstructured.62 Hence, comprehensibility is a real problem.

Table 1 compares activity diagrams, task structures and Petri nets in terms of the proposed criteria. This able shows that none of the three modeling languages, although selected with care, satisfies all requirements. The results show, among others, that UML's activity diagrams alone are not expressive enough for modeling in the context of care communication. State-based concepts, such as naturally supported by Petri nets, have to be incorporated. These concepts may be provided through a link with other specification techniques, such as UML's statecharts, or the more formal defined state-transition diagrams, such as those employed by Barros et al.,63 where they provide a service model for task structures. Or state-based concepts might be directly integrated in the activity diagrams themselves, but this seems less desirable, since it would reduce comprehensibility. Monitoring care communication requires capturing transition of various execution states such as accepted, rejected, or postponed.

Table 1 .

▪ Language Comparison in a Care Coordination Context

UML Task Structures Petri Nets
Conceptual + + +
Formal + +
Expressive +/ – +
Suitable + + +/ –
Comprehensible + +
Executable + + +

Conclusions

When recognizing the need for development of domain-specific languages, it is important to establish criteria by which such development should be guided. In this paper, a set of criteria is presented, illustrated, and justified.

In brief, domain-specific languages should be conceptual, so that they are not forced to specify domain-irrelevant details; formal, to avoid ambiguity and allow for sophisticated automated support; expressive, to fully capture a problem; suitable, to capture a problem conveniently; and comprehensible, since their models should be communicated with non-computer experts. Since insight into the meaning of a model might be improved by its execution, domain-specific models also have to be executable.

Naturally, these requirements cannot be proved to be complete, since they were derived from experiences of the authors. Extensions or refinements may be necessary if, for example, in assigning formal semantics, the distinction between intensional and extensional64 must be made. However it is not too difficult to see that these requirements are orthogonal; i.e., it is possible to violate each individual requirement without violating the others. We hope that this paper leads to the acceptance and dissemination of the presented requirements and furthers and guides development of domain-specific languages in medicine.

This work was performed at the Department of Medical Informatics of the University of Nijmegen and the Cooperative Information Systems Research Centre at Queensland University of Technology. Dr. van der Maas is currently with Roccare BV in The Netherlands; e-mail: <Arnoud@Roccare.nl>.

Footnotes

*

See the resource section at the Health Level 7 Web site, available at http://www.hl7.org/.

Computer Associates, Islandia, New York; http://www.cai.com.

Verve, Inc., San Francisco, California; http://www.verve.com.

References

  • 1.Helder JC, Jager JC, Reinders A. Object-oriented Conceptual Modelling for Public Health. Bilthoven, The Netherlands: RIVM [National Institute of Public Health and the Environment], 1993. Publication 533-37.
  • 2.Musen MA, Tu SW, Das AK, Shahar Y. EON: a component-based approach to automation of protocol-directed therapy. J Am Med Inform Assoc. 1996;3:367–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Dazzi L, Fassino C, Saracco R, Quaglini S, Stefanelli M. A patient workflow management system built on guidelines. Proc AMIA. Annu Fall Symp. 1997:146–50. [PMC free article] [PubMed]
  • 4.Delamarre D, Burgun A, Seka LP, Beux PL. Automated coding of patient discharge summeries using conceptual graphs. Methods Inform Med. 1995;34:345–51. [PubMed] [Google Scholar]
  • 5.Müller R, Thews O, Rohrbach C, Sergl M, Pommerening K. A graph grammar approach to represent causal, temporal and other contexts in an oncological patient record. Methods Inform Med. 1996;35:127–41. [PubMed] [Google Scholar]
  • 6.Seiver A, Holtzman S. Decision Analysis: A framework for critical care. In: Gardner RM, Shabot MM (eds). Decision Support Systems in Critical Care. New York: Springer-Verlag, 1994:74–104.
  • 7.Hripcsak G. Writing Arden syntax medical logic modules. Comput Biol Med. 1994;24(5):331–63. [DOI] [PubMed] [Google Scholar]
  • 8.ter Hofstede AHM. Information Modelling in Data Intensive Domains [PhD thesis]. Nijmegen: University of Nijmegen, 1993.
  • 9.ter Hofstede AHM, van der Weide TP. Expressiveness in conceptual data modelling. Data Knowl Eng. 1993;10(1):65–100. [Google Scholar]
  • 10.ter Hofstede AHM, Proper HA, van der Weide TP. Formal definition of a conceptual language for the description and manipulation of information models. Inf Syst. 1993;18(7):489–523. [Google Scholar]
  • 11.van der Maas AAF, ter Hofstede AHM, de Vries Robbé PF. Formal description of temporal knowledge in case reports. Artif Intell Med. 1999;16:251–82. [DOI] [PubMed] [Google Scholar]
  • 12.van der Maas AAF, ter Hofstede AHM. Formal description of disease courses. Artif Intell Med. 2000;18(1):29–55. [DOI] [PubMed] [Google Scholar]
  • 13.Cohen B. Justification of formal methods for system specification. Software Eng J. 1989;4(1):26–35. [Google Scholar]
  • 14.Spivey JM. Understanding Z: A Specification Language and Its Formal Semantics. Cambridge, UK: Cambridge University Press, 1988.
  • 15.Jones CB. Systematic Software Development using VDM. Englewood Cliffs: Prentice Hall, 1986.
  • 16.van Horenbeek I, Lewi J. Algebraic Specifications in Software Engineering: An Introduction. Berlin, Germany: Springer-Verlag, 1989.
  • 17.Hohenstein U, Engels G. Formal semantics of an entity-relationship-based query language. In: Kangassalo H (ed). Proceedings of the 9th International Conference on the Entity-Relationship Approach. Amsterdam, The Netherlands: North-Holland, 1991.
  • 18.Falkenberg ED, van der Pols R, van der Weide TP. Understanding process structure diagrams. Inf Syst. 1991;16(4):417–28. [Google Scholar]
  • 19.ter Hofstede AHM, van der Weide TP. Formalisation of techniques: chopping down the methodology jungle. Inf Software Technol. 1992;34(1):57–65. [Google Scholar]
  • 20.DeMarco T. Structured Analysis and System Specification. Englewood Cliffs: Prentice Hall, 1978.
  • 21.Zielhuis GA, Heydendael PHJM, Maltha JC, van Riel PLCM. Manual for medical scientific research [Handleiding medisch-wetenschappelijk onderzoek]. Utrecht, The Netherlands: Wetenschappelijke Uitgeverij Bunge, 1995.
  • 22.Brachman RJ. What IS-A is and isn't: an analysis of taxonomic links in semantics networks. IEEE Comput. 1983;16(10):30–6. [Google Scholar]
  • 23.Meyer B. Introduction to the Theory of Programming Languages. Englewood Cliffs, NJ: Prentice-Hall, 1990.
  • 24.Friedman C, Huff SM, Hersch WR, Pattison-Gordon E, Cimino JJ. The CANON Group's effort: working toward a merged model. J Am Med Inform Assoc. 1995;2:4–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.van Griethuysen JJ (ed). Concepts and Terminology for the Conceptual Schema and the Information Base. New York: ANSI, 1982. Publication ISO/TC97/SC5/WG3-N695.
  • 26.Lamport L. LaTeX: A Document Preparation System. Reading, Mass: Addison-Wesley, 1986.
  • 27.ter Hofstede AHM, Proper HA. How to formalize IT? Formalization principles for information systems development methods. Inf Software Technol. 1988;40(10):519–40. [Google Scholar]
  • 28.Hripcsak G, Clayton PD, Jenders RA, Cimino JJ, Johnson SB. Design of a clinical event monitor. Comput Biomed Res. 1996;29:194–221. [DOI] [PubMed] [Google Scholar]
  • 29.Evans DA, Cimino JJ, Hersh WR, Huff SM, Bell DS. Toward a medical-concept representation language. J Am Med Inform Assoc. 1994;1(3):207–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ceusters W, Steurs F, Zanstra PE, van der Haring EJ, Rogers JE. From a time standard for medical informatics to a controlled language for health. Int J Med Inf. 1998;48:85–101. [DOI] [PubMed] [Google Scholar]
  • 31.Rector AL, Bechhofer SK, Goble CA, Horrocks I, Nowlan WA, Solomon WD. The GRAIL concept modelling language for medical terminology. Artif Intell Med. 1997;9:139–71. [DOI] [PubMed] [Google Scholar]
  • 32.Brachman RJ. An overview of the KL-ONE knowledge representation system. Cogn Sci. 1985;9(2):171–216. [Google Scholar]
  • 33.Campbell KE, Cohn SP, Chute CG, Shortliffe EH, Rennels G. Scalable methodologies for distributed development of logic-based convergent medical terminology. Methods Inf Med. 1998;37(4–5):426–39. [PubMed] [Google Scholar]
  • 34.Howard RA, Matheson JE. Influence diagrams. In: Howard R, Matheson J (eds). Readings on the Principles and Applications of Decision Analysis, vol II. Menlo Park, Calif: Strategic Decisions Group, 1984:719–62.
  • 35.Scheurer T. Foundations of Computing: Systems Development with Set Theory and Logic. Wokingham, UK: Addison-Wesley, 1994.
  • 36.Szyperski C. Component Software: Beyond Object-Oriented Programming. Harlow, UK: Addison-Wesley, 1997.
  • 37.Sowa JF. Conceptual Structures: Information Processing in Mind and Machine. Reading, Mass: Addison-Wesley, 1984.
  • 38.Engels G, Gogolla M, Hohenstein U, Hulsmann K, Löhr-Richter P, Saake G, Ehrich HD. Conceptual modelling of database applications using an extended ER model. Data Knowl Eng. 1992;9(4):157–204. [Google Scholar]
  • 39.Groenboom R, Saaman E, Rotterdam E, Renardel de Lavalette G. Formalizing anaesthesia: a case study in formal specification. In: Gaudel MC amd Woodcock J (eds). Proceedings of Formal Methods Europe 1996 (Lecture Notes in Computer Science, vol 1051). Berlin, Germany: Springer-Verlag, 1996, 120–39.
  • 40.Harel D. On visual formalisms. Commun ACM. 1988; 31(5):514–30. [Google Scholar]
  • 41.Tse TH, Pong L. An examination of requirements specification languages. Comput J. 1991;34(2):143–52. [Google Scholar]
  • 42.Miller GA. The magic number seven, plus or minus two: some limits on our capability for processing information. Psychol Rev. 1956;63:81–97. [PubMed] [Google Scholar]
  • 43.van der Maas AAF, Vogel W. Computerised charting of case reports. Med Inform Internet Med. 2000;25(1):45–61. [DOI] [PubMed] [Google Scholar]
  • 44.Scufly RE, Mark EJ, McNeely WF, McNeely BF. Case records of the Massachusetts General Hospital: case 52–1993. N Engl J Med. 1993;329(27):2019–26. [DOI] [PubMed] [Google Scholar]
  • 45.Leymann F, Altenhuber W. Managing business processes as an information resource. IBM Syst J. 1994;33(2):326–48. [Google Scholar]
  • 46.Rector AL, Zanstra PE, Solomon WD, Rogers JE. Reconciling users' needs and formal requirements: issues in developing a reusable ontology for medicine. IEEE Trans Inf Technol Biomed. 1998;2:229–42. [DOI] [PubMed] [Google Scholar]
  • 47.Musen MA. Domain ontologies in software engineering: use of Protégé with the EON architecture. Methods Inf Med. 1998;37(4–5):540–50. [PubMed] [Google Scholar]
  • 48.Fowler M. UML Distilled: Applying the Standard Object Modeling Language. Reading, Mass: Addison-Wesley, 1997.
  • 49.Booch G, Rumbaugh J, Jacobson I. The Unified Modeling Language User Guide. Reading, Mass: Addison-Wesley, 1998.
  • 50.Peterson JL. Petri Net Theory and the Modeling of Systems. Englewood Cliffs, NJ: Prentice Hall, 1981.
  • 51.ter Hofstede AHM, Nieuwland ER. Task structure semantics through process algebra. Software Eng J. 1993;8:14–20. [Google Scholar]
  • 52.Rumbaugh J, Jacobson I, Booch G. The Unified Modeling Language Reference Manual. Reading, Mass: Addison-Wesley, 1999.
  • 53.Casati F, Ceri S, Pernici B, Pozzi G. Conceptual modeling of workflows. In: Papazoglou M (ed). Proceedings of the OOER '95, 14th International Object-Oriented and Entity-Relationship Modelling Conference. (Lecture Notes in Computer Science, vol 1021). Berlin, Germany: Springer-Verlag, 1995:341–54.
  • 54.ter Hofstede AHM, Kiepuszewski B. Formal Analysis of Deadlock Behaviour in Workflows [technical report]. Brisbane, Australia: Queensland University of Technology/Mincom, 1999.
  • 55.Desel J, Esparza J. Free Choice Petri Nets. (Cambridge Tracts in Theoretical Computer Science, vol 40). Cambridge, UK: Cambridge University Press, 1995.
  • 56.ter Hofstede AHM, Orlowska ME. On the complexity of some verification problems in process control specifications. Comput J. 1999;42(5):349–59. [Google Scholar]
  • 57.van der Aalst WMP, ter Hofstede AHM, Kiepuszewski B, Barros AP. Workflow Patterns. Eindhoven, The Netherlands: BETA Research Institute, Eindhoven University of Technology, 2000. Technical report WP 47.
  • 58.ter Hofstede AHM, Orlowska ME, Rajapakse J. Verification problems in conceptual workflow specifications. Data Knowl Eng. 1998;24(3):239–56. [Google Scholar]
  • 59.Baeten JCM, Weijland WP. Process Algebra. Cambridge, UK: Cambridge University Press, 1990.
  • 60.Edmond D. Applications of Reflection for Cooperative Information Systems [PhD thesis]. Brisbane, Australia: Queensland University of Technology, 2000.
  • 61.van der Aalst WMP, Barros AP, ter Hofstede AHM, Kiepuszewski B. Advanced workflow patterns. In: Etzion O, Scheuermann P (eds). Proceedings of the 7th International Conference CoopIS 2000 on Cooperative Information Systems. (Lecture Notes in Computer Science, vol 1901). Berlin, Germany: Springer-Verlag, 2000:18–29.
  • 62.He X, Lee JAN. A Methodology for Constructing Predicate Transition Net Specifications. Software Pract Exp. 1991;21(8):845–75. [Google Scholar]
  • 63.Barros AP, ter Hofstede AHM, Proper HA. Towards real-scale business transaction workflow modelling. In: Olivé A, Pastor J (eds). Proceedings of the 9th International Conference CAiSE'97 on Advanced Information Systems Engineering. (Lecture Notes in Computer Science, vol 1250). Berlin, Germany: Springer-Verlag, 1997:437–50.
  • 64.Campbell KE, Oliver DE, Spackman KA, Shortliffe EH. Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc. 1998;5(5):421–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES