Abstract
Clinical diagnosis is an iterative process because of partial and ambiguous information, changing conditions, and resource constraints. Although clinical diagnostic decision support systems have been successfully used to support clinical care, they face certain limitations in supporting clinical diagnosis as an iterative process. An approach is required to enhance the iterative process support in clinical diagnostic decision support systems. We model the clinical diagnosis process as a hypothesis-driven story building, and implement a prototype clinical diagnostic decision support system that is able to generate and evaluate differential diagnoses, narrow and revise the diagnoses based on newly obtained information, and prioritize resources for information seeking.
INTRODUCTION
Clinical diagnosis is “the act or process of identifying or determining the nature and cause of a disease or injury through evaluation of patient history, examination, and review of laboratory data”1. A diagnosis is usually made based on a combination of factors, including medical history, signs and symptoms, and from the results of various diagnostic procedures. During the process, a physician generates differential diagnoses, narrows/revises them, and finally confirms a diagnosis. Because of the complex nature of patient conditions and clinical settings, the physician makes these diagnoses in an iterative manner. She will re-visit earlier steps and decisions based on new information which may then change her initial diagnosis.
A physician faces three particular challenges during the diagnostic process: (a) Partial and ambiguous information: During the diagnosis process, information often partially arrives over time and can be ambiguous. For instance, a symptom may have not been observed although it has developed; or an observed symptom may point to multiple diagnoses. (b) Changing conditions: A patient’s condition is dynamic and may change at any time during the diagnostic process. When this happens, the physician needs to properly revise the diagnosis and accordingly adjust the treatment plan. (c) Resource constraints: A physician often requests tests or scans to get more information for further diagnosing. However, many of these resources such as a CT scan are expensive and tightly scheduled. All these challenges are intertwined in a clinical diagnosis and must be dealt with in an integrated fashion. For example, the problem of partial information requires other relevant information to be collected, which may be constrained by resource availability; the newly collected information may indicate a change of patient conditions and thus require revisions to previous diagnosis. Therefore, the process of clinical diagnosis is highly iterative because of these features.
Clinical diagnostic decision support systems (CDDSSs) have been developed and successfully used to assist physicians in clinical care2. However, many systems have certain limitations in supporting the iterative process of clinical diagnosis because they may not have taken all these challenges together into account. Therefore, an integrative approach that addresses all these challenges is required to enhance the iterative process support in clinical diagnosis.
We model the process of clinical diagnosis as hypothesis-driven story building (HDSB), which aims to find explanations for observed information. It generates hypotheses based on partial and ambiguous information, prioritizes resources for information seeking, and makes revisions to previous hypotheses with newly obtained information. We also describe an early stage HDSB-modeled CDDSS implemented on an agent-based platform that we believe can enhance supporting the iterative process of clinical diagnosis.
BACKGROUND
A CDDSS is a computer-based clinical consultation system designed to offer real-time support for medical decision-makers3. Systems such as QMR4, Iliad5, DXplain6, and CASNET/Glaucoma7 do this by exploiting the causal-effect medical knowledge.
The causal-effect relations between diseases and symptoms in CDDSSs are usually represented in rules, causal association network, or Bayesian networks. Rule-based system follows the “IF-THEN” mode, and is mainly implemented in early expert systems, such as MYCIN8, and INTERNIST-I4. Causal association network (CASNET) is a graphical representation of qualitative causal-effect relations. In the CASNET, variables are divided into three levels: the disease states, the pathophysiological states, and the observation states. Given observed symptoms, a path will be traced all the way up to the disease states. The CASNET has been implemented to diagnose glaucoma7. Finally, a Bayesian network is a graphical probabilistic representation of causal-effect relations, where a conditional probability table is given to each node. In Iliad5 and DXplain6, Bayesian networks have been developed, which involve thousands of nodes for representing symptoms and diseases.
Regardless of which inference mechanism is used in these CDDSSs, they follow the same decision support model. In this model, the input includes clinical signs, symptoms, or laboratory results and the output includes diagnostic and therapeutic recommendations. However, this model does not completely reflect the iterative nature of actual clinical diagnosis process. Some important functions that are required to handle the iterative process are missing. The input should be regarded not only as evidence for making a diagnosis, but also what needs to be actively collected based on what output have been produced. For example, after the initial differential diagnosis is made based on inputted information, current CDDSSs do not make recommendations on which lab tests to further order considering resource constraints, nor do they monitor for new symptoms and revise the differential diagnosis based on newly obtained information2.
We will extend the clinical decision support model by enabling it to address the three issues discussed earlier and model clinical diagnosis process as hypothesis-driven story building.
CLINICAL DIAGNOSIS PROCESS AS HYPOTHESIS-DRIVEN STORY BUILDING
A clinical diagnosis process focuses on identifying diagnosis that can explain all the observed symptoms, signs, and lab results. At the start of a clinical diagnosis process, the observed symptoms and signs may be explained by multiple diagnoses. For example, if a patient complains about chest pain and shortness of breath, more than one diagnosis (e.g., pneumonia, pulmonary embolism, and tuberculosis) can explain these symptoms. However, none of these diagnoses can be confirmed at the moment. More information, such as results from blood test, EKG, or CT scan, needs to be collected to identify the actual diagnosis.
The clinical diagnosis process can be generalized as a story building process, which aims to build a “story” that attempts to explain observed information by assuming the existence of other “missing” facts. For instance, in clinical diagnosis, a story is composed of a diagnosis and all the symptoms and lab results resulting from the diagnosis even if some of those results are not available yet. The term story building was proposed by Gary Klein in his cognitive Recognition-Primed Decision (RPD) model to describe decision-makers’ behaviors when there is inadequate information for decision-making9. During a story building process, hypotheses are generated, evaluated, revised, narrowed, and finally confirmed; at the same time, information is collected to continually narrow down the hypotheses. We elaborate on the basic concept of story building, and propose a Hypothesis-Driven Story Building (HDSB) framework (Figure 1).
Figure 1:
Hypothesis-Driven Story Building
As depicted in this framework, the hypothesis generation module first starts when information is observed, which creates a hypothesis space. In clinical settings, differential diagnoses are produced as symptoms are observed or reported. The hypothesis evaluation module will then evaluate the hypotheses in the hypothesis space, and produce a space of ranked hypotheses. In clinical terms, differential diagnosis can be ranked in many ways, such as physicians’ experience, diagnosis prevalence, or their probabilities.
The ranked hypotheses will then drive the information seeking process. For example, lab tests are ordered at this stage, which are expected to further differentiate the differential diagnoses. The newly obtained information will either re-trigger the hypothesis generation module if the new information is a significant deviation from earlier hypotheses, or trigger the hypothesis revision module if the information will have only minor changes to the hypothesis space or slight adjustments to their rankings. For instance, the lab results will further confirm the disease that was highly suspected, or rule out the disease that seems unlikely.
Once the hypothesis is confirmed and actions are taken, the status of the situation needs to be monitored. For example, after treatment or surgery, the patient’s condition needs to be closely monitored, so that the previous diagnosis and treatment plan can be properly adjusted based on new information.
The HDSB framework serves not only as a theoretic model for describing clinical diagnosis, but also a framework for a CDDSS.
APPROACH
We have developed the algorithms for the major components of the HDSB framework. These algorithms use Bayesian network as the technical foundation for knowledge representation. We describe these algorithms in this section.
Generating and Evaluating Differential Diagnosis
In a clinical context, each condition has one normal value (or range) and at least one abnormal value (or range). For example, the body temperature between 98.2 °F ±1.3 °F is regarded normal; values outside this range are considered abnormal. If a condition is already known to be normal or abnormal, the condition is noted but does not require further investigation. Therefore, a clinical diagnosis process is primarily interested in investigating unknown conditions. Furthermore, the clinical diagnosis process is more interested in identifying the abnormal conditions than the normal conditions in unknown conditions. Algorithm 1 describes the process to identify the conditions that are suspected to be abnormal in unknown conditions.
Algorithm 1: Generating Differential Diagnoses
Input:
BN: a Bayesian network;
O = {N1, N2, .., Nk}, where N1, N2, .., Nk are the k nodes that have been observed abnormal.
Output:
H: a collection of nodes other than O.
Body:
- for each Ni in O, do:
- for each Nj in BN, do:
- if (Nj not in O) and (Nj and Ni are not dseparated) and (Nj not in H)
- then H ← Nj.
Return H. ▪
Algorithm 1 describes how a condition in a Bayesian network can be considered a new hypothesis. A new hypothesis needs to satisfy three conditions: (1) it is not already observed; (2) a modification on an observed condition will affect its probability; and (3) it has not been included in the hypothesis space.
For each hypothesized diagnosis, posterior probabilities can be obtained for each of its values using Bayesian inference methods. A condition that has higher probability on a suspected abnormal value is more likely to catch a physician’s attention, and thus should have a higher rank. Two criteria will be applied to determine the ranking of a hypothesis condition: (1) A condition that has lower probability on the normal value should have a higher ranking index; (2) A condition whose probabilities of abnormal values are more unevenly distributed should have a higher ranking index.
For a hypothesis condition that has n values (including 1 normal value and n-1 abnormal values), if its probability distribution over its values is <p0, p1, …, pn-1>, where p0 is the probability for the normal value, and p1, …, pn-1 are the probabilities for each abnormal value, then we can measure the ranking index (RI) for each condition using Eq. 1.
| (Eq. 1) |
In Eq. 1, E(P) is the entropy10 that measures how unevenly the abnormal values are distributed. Because a higher ranking index requires lower entropy on abnormal values, we divide (1- p0) by the entropy. For example, for two conditions A and B, if the probability distribution on A is (0.2, 0.3, 0.5), and the probability distribution on B is (0.1, 0.2, 0.7), then RIA = 2.60, RIB = 3.63. RIB is higher because its abnormal values get higher probability and are more unevenly distributed.
Revising Differential Diagnoses
When a new symptom is observed, the structure of the hypotheses space or its ranking may be revised. Algorithm 2 describes how the differential diagnoses should be revised according to a new symptom.
Algorithm 2: Revising Differential Diagnoses
Input:
BN: a Bayesian network;
O: previous observations;
H: previous differential diagnoses with rankings;
onew: a new observation.
Output:
H: revised differential diagnoses with rankings.
Body:
- if onew is a revision of an o in O,
- if onew revises an abnormal value to another abnormal value,
- then recalculate the posterior probability of each h in H and update its ranking index accordingly.
- else if onew revises an abnormal value to the normal,
- then from H delete all the hypotheses that are dseparated from any o in O by onew.
- else if onew is a new condition,
- then generate the hypotheses for onew and add them to H.
Recalculate the probability and ranking index for each h in H. ▪
According to algorithm 2, the structure of differential diagnoses will change if a condition changes from abnormal to normal, or a condition is newly observed as abnormal. The rankings will be adjusted, if a condition changes from abnormal to abnormal.
Resource-Efficient Information Seeking
In clinical diagnosis, information such as lab tests and scans needs to be ordered, but they cost resources. In an environment where resources’ costs are high or resources are tightly scheduled, it is important to prioritize the use of resources for information seeking. Consequently, there is a need to measure the efficiency of resource use. The resource efficiency is determined by two factors: the test’s capability of confirming or falsifying a diagnosis, and the cost of the resource used for the test. For a diagnosis node, if its current probability distribution is (p01, p02, …, p0n), its probability distribution after test T with returned result vi is (pi1, pi2, …, pin), and the resource cost is ci, then the resource efficiency index (EI) can be calculated with Eq. 2.
| (Eq. 2) |
where
V(T=vi) measures how significant a test with value vi can change the current probability distribution of a diagnosis. Eq. 2 calculates the test’s average capability for a particular diagnosis per unit cost. Based on the resource efficiency index, a physician can select the most efficient tests for a patient, rather than ordering unnecessary or redundant tests.
PROTOTYPE IMPLEMENTATION
The HDSB and algorithms have been implemented on an agent-based platform using JavaBayes11. Figure 2 presents the GUI of the prototype agent-based CDDSS. The main window displays the Bayesian network developed for monitoring emergency care patients12. The window on the right side displays the ranked differential diagnoses with probabilities and ranking index attached.
Figure 2:
A Prototype of Agent-based CDDSS
In the main window, each node of the Bayesian network represents a condition in the clinical causeeffect relationship. Different types of nodes are colored differently: observed nodes are colored in blue and unobserved nodes are colored in grey.
Based on the observed nodes, all hypothesized causal nodes (differential diagnoses) are listed in the right window. For each hypothesized node, the posterior probabilities of its values are visualized with a probability bar. With the posterior probabilities from all its values, the ranking index of the node can be calculated using Eq.1. All the suspected nodes are listed in the decreasing order of rank.
Once a condition is newly observed, its node will change its color from grey to blue. At the same time, the hypothesis nodes and the posterior probabilities of their values will be updated automatically. The list of hypothesis nodes will also be updated accordingly. Therefore, at any time, a user will receive up-to-date information and recommendations.
Once a hypothesis node in the right window is clicked, its corresponding node in the Bayesian network will be shown in red. A pop-up menu will show up when a node is right clicked. If a node is highly suspected (e.g. >95%), the user can select Diagnose to confirm the diagnosis. If more information is needed to further verify a hypothesis node, the user can select Recommend Test, which will return a recommended node for ordering a test.
The recommended node will be colored in orange. For example, as shown in Figure 2, Hypovolemia is a suspected node, and a test on StrokeVolume is recommended for further verifying the suspected node. Once a test on a node is recommended, the user can select to Order the test. In our implementation, the ordering request will be sent to another agent for response.
We are currently conducting experiments to test the performance of the agent-based CDDSS. Physicians are being recruited to participate in the experiment. They will be divided into the experimental group (with decision support) and the control group (without decision support) and asked to make diagnostic decisions for different scenarios. Participants from the experimental group are expected to make less misdiagnoses with less time and less resources.
CONCLUSION
Because of the challenges of partial and ambiguous information, changing conditions, and resource constraints, clinical diagnosis needs to be supported as a dynamic and iterative process. We extended the current clinical decision support paradigm by modeling clinical diagnosis as hypothesis-driven story building, and implemented it in an agent-based prototype.
REFERENCES
- 1.Diagnosis. The American Heritage Dictionary of the English Language. 4th Edition. Houghton Mifflin Company; 2003. [Google Scholar]
- 2.Kong G, Xu D-L, Yang J-B. Clinical Decision Support Systems: A Review on Knowledge Representation and Inference Under Uncertainties. International Journal of Computational Intelligence Systems. 2008;1(2):159–67. [Google Scholar]
- 3.Musen MA, Shahar Y, Shortliffe EH. Clinical Decision-Support Systems. In: Shortliffe EH, Perrault LE, Wiederhold G, Fagan LM, editors. Medical Informatics -- Computer Applications in Health Care and Biomedicine. 2ed. New York: Springer; 2000. pp. 573–609. [Google Scholar]
- 4.Myers JD. The Background of INTERNIST-I and QMR. In: Blum BI, Duncan K, editors. A History of Medical Informatics. New York: ACM Press; 1990. pp. 427–33. [Google Scholar]
- 5.Warner HR, Bouhaddou O. Innovation Review: Iliad -- A Medical Diagnostic Support Program. Top Health Inf Manage. 1994;14(4):51–8. [PubMed] [Google Scholar]
- 6.Barnett GO, Famiglietti KT, Kim RJ, Hoffer EP, Feldman MJ. DXplain on the Internet. Proceedings of the 1998 AMIA Annual Fall Symposium. 1998. pp. 607–11. [PMC free article] [PubMed]
- 7.Kulikowski CA, Weiss SM. Representation of Expert Knowledge for Consultation: The CASNET and EXPERT Projects. In: Szolovits P, editor. Artificial Intelligence in Medicine. Boulder, Colorado: Westview; 1982. [Google Scholar]
- 8.Shortliffe EH. Computer-based Medical Consultations: MYCIN. New York: Elsevier; 1976. [Google Scholar]
- 9.Klein G. Recognition-primed decisions. In: Rouse WB, editor. Advances in Man-Machine Systems Research. Greenwich, CT: JAI Press; 1989. pp. 47–92. [Google Scholar]
- 10.Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948;27:379–423. 623–56. [Google Scholar]
- 11.Cozman FG.JavaBayes: Bayesian Networks in Java 2001[cited; Available from: http://www.cs.cmu.edu/~javabayes/
- 12.Beinlich I, Suermondt G, Chavez R, Cooper G. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. 2nd European Conference on Artificial Intelligence and Medicine; Springer-Verlag; 1989. [Google Scholar]


