Abstract
Structured entry forms for clinical records should be updated to take into account the physicians’ needs during consultation and advances in medical knowledge and practice. We updated the computerized medical record form of a hypertension clinic, based on its previous use and clinical guidelines. A statistical analysis of previously completed forms identified several unnecessary items rarely used by clinicians. A terminological analysis of guidelines and of free-text answers on completed forms identified several new topics relevant to current clinical practice. We therefore added new items to the form and some topics previously recorded as free text were itemized. We collaborated with clinicians in interpretation of the results of the statistical and terminological analyses used as the starting point and guide for this updating process.
Introduction
Medical record forms are becoming a widespread component of electronic health record systems. Structured entry increases the quality and reuse of patient data for clinical care (including decision support), clinical audit and research, medical coding for resource allocation and health services planning.1 However, unforeseen situations and particular cases can be transcribed only by free-text data entry. Semi-structured forms, combining structured and free-text entry, are the best compromise between input control (ensuring data completeness and accuracy), input flexibility (accommodating particular features of patients) and input simplicity (facilitating data entry).2
Once a semi-structured form is brought into use, the issue of its maintenance arises. Two broad areas have to be considered: layout and content. The organization of the form may be inappropriate, increasing the burden of data entry. The phrasing of the questions may be ambiguous, leading to variability in data entry. Alternatively, the content of the form may not meet clinicians’ needs in actual practice, due to inadequate modeling from the outset or because clinical knowledge and practice have evolved and the form is outdated. Other factors may make it necessary to update the content of the form. For example, new laws may require the systematic entry of certain data in all clinical records.
We present an approach for updating the content and organization of a computerized medical record form used in a hypertension clinic. Natural language processing has been used by other authors to improve clinical data entry templates, based on patterns of previous use.3 Similarly, we carried out a terminological analysis of free-text answers from previously completed forms. Two additional dimensions were also considered: statistical analysis of the response rate for each question and terminological analysis of clinical guidelines. Previous use of the form reflects the physicians’ real needs and clinical guidelines reflect advances in medical knowledge and practice.
Materials and methods
Medical record form –
Computerized medical records have been used over the last 30 years in the hypertension unit of our hospital.7 About 5,000 semi-structured records are entered each year. The entry form has four main sections (Table 1) and currently includes 176 optional questions, covering data of various types: Boolean, checklist, date, numerical or free-text. Answers are predefined for Boolean questions (yes/no or present/absent) and checklist questions (list of possible answers, among which users make either single or multiple choices). The goal of the update process was to obtain and analyze empirical material, with the aim of identifying items that should be deleted, added or reorganized in this semi-structured form.
Table 1.
Original organization of the medical record form for hypertensive patients.
| Sections | Subsections |
|---|---|
| Hypertension and other cardiovascular risk factors | Familial history |
| Personal history | |
| Current treatment | |
| Medical history | By body systems |
| Current findings | Blood pressure measurements |
| Findings by target organs | |
| Laboratory test results | |
| Conclusion | Hypertension etiology |
| Cardiovascular risk | |
| Other conclusions and management plan |
Statistical analysis of completed entry forms –
For the set of 5,109 records entered in 2005, we determined the response rate for each question and the rate of use of each of the predefined answers for Boolean and checklist questions. If a question or a predefined answer was used in less than 50 records (about 1% of the records), its deletion was considered.
Terminological analysis of completed entry forms –
free-text answers from these 5,109 records were made anonymous. Their concatenation resulted in a 350,000 word corpus in French. This corpus was analyzed by the Syntex and Upery tools.4 Syntex performs a morpho-syntactic analysis of sentences and yields occurrence statistics and a dependence network of terms. Upery performs distributional analysis on the dependence network.
Numerous synonymous terms may refer to the same concept. For example, “poorly controlled hypertension” and “unsatisfactory controlled hypertension” both refer to the concept of uncontrolled hypertension. Some concepts may also be expressed in positive and negative forms (e.g. “complicated hypertension” and “uncomplicated hypertension”). Syntex identifies and counts the occurrences of self-standing terms. The occurrences of a given concept were counted by manually summing all the occurrences of all synonyms referring to the concept concerned in a positive or negative way. If a concept occurred more than 50 times in free-text answers, the inclusion of a corresponding item in the form (question or predefined answer) was considered.
The dependence network built by Syntex embeds the syntactic connections between terms. For example, “renal artery stenosis” is derived from “stenosis” and “renal artery” linked by the preposition “of”. Upery displays terms sharing a large number of syntactic dependences. The probability of these terms being semantically related is high. For example, the terms “primary aldosteronism” (PA) and “renal artery stenosis” (RAS) are often used in the same syntactic context, like “proven PA”/“proven RAS”, “being caused by PA”/“being caused by RAS”, “showing PA”/“showing RAS”, etc. This commonness of linguistic use is strongly suggestive of semantic similarity, in this case due to primary aldosteronism and renal artery stenosis both being causes of hypertension. The dependence network and distributional analysis provided clues to the interrelationship of concepts in the physicians’ representation of their field. These clues were used to propose a more convenient organization of questions in the form.
Terminological analysis of guidelines –
The manual analysis of a single guideline might have yielded satisfactory results, but we thought that pooling several guidelines and using natural language processing tools would be more likely to decrease the risk of overlooking important concepts. Eight hypertension guidelines in English, published between 1999 and 2005 by various national or international organizations, were therefore assembled to constitute a 56,000 word corpus. This corpus was also analyzed with Syntex and Upery, which can work with English texts. All the resulting terms, however many times they occurred, were screened. This made it possible to extract clinically relevant concepts that were not specifically covered by items on the form. We therefore considered including items related to these concepts. Some structuring concepts were also identified by terminological analysis. This made it possible to develop a more convenient organization of questions into sections and subsections of the form.
Collaboration with clinicians –
Final decisions were taken by a group of senior physicians from the unit, after the presentation and discussion of analyses by a clinician with knowledge engineering background (first author). We arranged three meetings: at the first, the knowledge engineering process was explained; at the second, the suggested changes were validated or rejected and at the third, the final draft was reviewed and approved.
Results
Statistical analysis of completed entry forms –
Among the 176 questions,
19 were answered less than 50 times in the set of 5,109 forms. Eight of these questions were related to the Fagerström nicotine dependence score, and were completed only three times. Three other questions concerned the dates of uncommon events among patients seen in the hypertension clinic: pulmonary edema, myocardial infarction and transient ischemic attack. The eight remaining questions concerned ankle blood pressure measurements and ankle brachial pressure index, which are useful when evaluating peripheral arterial disease, but too time-consuming for consultations for hypertension. The questions concerning Fagerström score, ankle blood pressure measurements and ankle brachial pressure index were deleted. Pulmonary edema and myocardial infarction were added as predefined answers for the checklist question “History of heart disorders?”. “Transient is-chemic attack” was pooled with “Stroke” under the new label “Cerebrovascular event” as a predefined answer to the checklist question “History of neuropsychiatric disorders?”.
Among the 38 predefined answers to the 19 Boolean questions
(“yes” or “no”), only those related to the four Boolean questions of the Fagerström nicotine dependence score were used less than 50 times. The decision to remove these questions had already been taken.
Among the 203 predefined answers to the 39 checklist questions,
78 were used less than 50 times. Thirty-five of these responses concerned features of the clinical examination. Some concerned features that are generally uncommon, such as diastolic heart murmur (as an answer to the question “Heart murmur?”), whereas other concerned features uncommon in the population attending the hypertension clinic, such as metrorrhagia (“Urogenital findings?”). Some of these features were rarely acknowledged by patients despite being frequent, such as impotence (“Urogenital findings?”). Finally, hypertension specialists rarely look for some features, such as right ulnar missing pulse (“Missing pulses?”). The remaining rarely used predefined answers related to uncommon disorders in the medical history (36), such as Takayasu disease (“Disorders of blood vessels?”), or uncommon drugs (6), like hirudin and derivatives (“Antithrombotic drugs?”). Forty-four of these 77 predefined answers were removed from the form. The whole question “Heart murmur?” was removed, because data on heart murmur are not particularly useful in hypertension management. Of course, the presence of a heart murmur can still be recorded as an answer to the free-text question “Other heart findings?”. The last rarely used predefined answer was “Coronary stenosis” (“Disorders of blood vessels?”), which was recorded only 12 times, even though 181 patients were identified as having an coronary insufficiency, almost invariably due to coronary stenosis. It seems that the clinicians did not use this predefined answer because they did not know what it meant or the difference between this response and another predefined answer “coronary insufficiency” (“History of heart disorders?”). The users did not understand whether all patients with definite coronary stenosis should be identified as such, or whether only those with coronary stenosis in the absence of coronary insufficiency should be identified by this response. The other 33 rarely used predefined answers were judged too important to be deleted, for clinical research if not for clinical care.
Terminological analysis of completed entry forms –
We identified 103 new concepts occurring more than 50 times in the corpus but not covered by items on the form. Distributional analysis clustered these concepts, facilitating the definition of relevant classes. Nine classes were defined: hypertension characteristics (7 concepts), hypertension etiology (3), medical history (3), drug intolerance findings (12), other findings (2), laboratory tests and results (10), imaging procedures and results (30), drug names (27), therapeutic decisions (9).
Hypertension characteristics concepts were used to add a structured subsection in the conclusion section, concerning five aspects of the pattern of hypertension: timing of hypertension (permanent, episodic, white-coat or masked hypertension), component of blood pressure measurements displaying an increase (global, isolated systolic or isolated diastolic hypertension), blood pressure levels (mild, moderate or severe hypertension), target organ damage (complicated or uncomplicated hypertension) and therapeutic control (controlled, uncontrolled or resistant hypertension).
The predefined answers to the checklist question “Hypertension etiology?” were slightly modified: Sleep apnea syndrome was added and primary aldosteronism was subdivided into Conn’s adenoma and bilateral adrenal hyperplasia. One of the medical history concepts, migraine, was added to the predefined answers to the question “History of neuropsychiatric disorders?”. The laboratory results section of the form was enriched with slots for values frequently reported in free-text comments: plasma renin, plasma aldosterone, urinary aldosterone, urinary cortisol and urinary catecholamines concentrations.
It would have been too cumbersome for clinicians to enter the imaging results in a structured way. Two new free-text questions — called “Vascular imaging?” and “Uronephrologic imaging?” — were created to make it possible to copy and paste the conclusions of imaging reports. This make it possible to record information more consistently, although still in the form of free text
Terminological analysis of guidelines –
Our terminological analysis of guidelines provided terms related to the critical appraisal of medical literature (e.g. “meta-analysis” or “placebo”) and to general clinical knowledge (“treatment threshold” or “drug metabolites”). It also identified 95 clinical concepts not specifically covered by items on the form (32 of which were already found in the free-text answers). The nine conceptual classes identified during the terminological analysis of completed entry forms were appropriate for the sorting of all but three of these concepts. The concepts that could not be sorted with these classes all related to ethnicity. The numbers of concepts in each class were as follows: hypertension characteristics (13 concepts), hypertension etiology (9), medical history (9), drug intolerance findings (3), other findings (5), laboratory tests and results (5), imaging procedures and results (14), drug names (5), therapeutic decisions (29).
Concepts relating to hypertension characteristics were used to refine the new structured subsection about hypertension pattern. Additional concepts concerning the etiology of hypertension were subtypes of uncommon monogenic hypertensions, which occur to rarely to be added to the hypertension etiology checklist as new items. Some important drug contraindications found in guidelines were added to the medical history section of the form. These contraindications included for example Raynaud’s phenomenon and gout. The concept of dementia appeared in guidelines as a newly recognized complication of hypertension and was added as a predefined answer to the checklist question “History of neuropsychiatric disorders?”. Finally, we organized the management plan subsection, by dividing it into four more specific free-text questions about the patient’s treatment goals, intended lifestyle modifications, intended drug changes and follow-up plan (including additional tests). The changes made to the entry form are shown in bold typeface in Table 2.
Table 2.
New organization of the medical record form for hypertensive patients at the end of the whole process. Subsections with content changes are indicated in bold typeface.
| Sections | Subsections |
|---|---|
| Hypertension and other cardiovascular risk factors | Familial history |
| Personal history | |
| Current treatment | |
| Medical history | By body systems |
| Current findings | Blood pressure measurements |
| Findings by target organs | |
| Laboratory test results | |
| Conclusion | Hypertension etiology |
| Hypertension pattern | |
| Cardiovascular risk | |
| Treatment goals | |
| Lifestyle modifications | |
| Drug changes | |
| Follow-up plan | |
| Other conclusions |
Discussion
Semi-structured forms are currently seen as the most convenient means for physicians to enter clinical data into computerized clinical records. However, if they are not modified to take into account the physicians’ actual needs and advances in medical knowledge, they are unlikely to remain acceptable and valid. We updated a form heavily used in a hypertension clinic, to try to overcome these problems.
The need for collaboration between clinicians and knowledge engineers –
The knowledge engineering process produces useful empirical data, which can be used as a starting point for the update process. Statistical analysis shows which questions and predefined answers are actually used by clinicians. Terminological analysis of free-text entries can be used to restructure the information currently entered in the form of free text. Terminological analysis of guidelines can be used to take advances in medical knowledge into account.
Of course, the results of statistical and terminological analyses must be interpreted with the help of expert clinicians, and changes should only be made with the agreement of these experts. For example, it may be valuable for an uncommon condition with important consequences for clinical care or research to be itemized, even if it is rarely used. Indeed, the deletion of the item may result in clinicians overlooking the condition because the checklist effect is lost, and would prevent searches of the database being made for that item for research purposes. Conversely, clinicians may often record certain data in free text that are of no great importance for clinical care (to ensure that the data recorded are as exhaustive as possible, for example). Thus, the frequent occurrence of a concept in free-text answers does not guarantee the clinical relevance of that concept and it should not be included as an item in the entry form without first being validated by domain specialists. These issues may be potentially difficult to resolve, but no disagreement occurred between the physicians we worked with
Limitations of the knowledge engineering process –
Arbitrary or subjective choices may be made at least at two levels of our approach. First, the threshold above which concepts from the free-text answers corpus were considered for inclusion in the structured part of the form was chosen as a compromise between information load and feasibility. A lower threshold would have produced more candidates for inclusion, but would also have increased the burden of the knowledge engineering process. Second, although guided by tools and statistical results, the terminological analysis is open to subjective decisions, in the identification of synonyms or the definition of the conceptual classes to which the new concepts belong, for example.
Limitations of the approach in terms of extensiveness –
This update approach made it possible to revise the content of the form systematically, to take into account the needs of the users and advances in medical knowledge. We also reorganized the form, to some extent. However the systematic revision of the phrasing and user-friendliness of the form would require qualitative feedback from the whole spectrum of possible users. The views of naive users, such as medical students or interns, would be particularly useful. This shortcoming highlights the need to integrate more dimensions into the update process than allowed by the approach presented here. The issue of ambiguous question labels is being addressed in an ongoing qualitative study on the understanding of several users, with different levels of clinical expertise. We hope to reduce the variability in answers by improving question labels.
Limitations of the approach in terms of generalization –
Guidelines cover only a small fraction of the clinical problems met in some settings, such as general practice or internal medicine. They therefore only partially reflect the state of useful medical knowledge in such settings. Moreover, the free-text parts of clinical records in these contexts are unlikely to show recurrent patterns that could be detected by terminological analysis, because of the variety of clinical circumstances. For the same reasons, a medical record form in these settings is likely to be only grossly structured and updates are more likely to be triggered to improve user-friendliness than in response to content issues. By contrast, it should be possible to implement our approach fully in specialized domains covered by guidelines and dealing with recurrent clinical problems.
Conclusion
We describe an approach for revising the content and organization of a medical record form, based on an analysis of its previous use (reflecting clinicians’ needs in practice) and an analysis of recent clinical guidelines (reflecting changes in medical knowledge). The results of statistical and terminological analyses provide empirical data to be used as a starting point and guide for the update process. However, our experience suggests that working with users to interpret these data increases the pertinence and acceptability of the update result. It should be possible to reproduce this approach in other clinical contexts, provided that these contexts are specialized enough to be at least partly covered by guidelines, and for the free-text parts of clinical records to show some recurrent patterns.
References
- 1.Powsner SM, Wyatt JC, Wright P. Opportunities for and challenges of computerisation. Lancet. 1998;352:1617–1622. doi: 10.1016/S0140-6736(98)08309-3. [DOI] [PubMed] [Google Scholar]
- 2.Tange HJ, Hasman A, de Vries Robbé PF, Schouten HC. Medical narratives in electronic medical records. Int J Med Inform. 1997;46:7–29. doi: 10.1016/s1386-5056(97)00048-8. [DOI] [PubMed] [Google Scholar]
- 3.Wilcox AB, Narus SP, Bowes WA3. Using natural language processing to analyze physician modifications to data entry templates; Proc AMIA Symp; 2002. pp. 899–903. [PMC free article] [PubMed] [Google Scholar]
- 4.Degoulet P, Chatellier G, Devries C, Lavril M, Menard J. Computer-assisted techniques for evaluation and treatment of hypertensive patients. Am J Hypertens. 1990;3:156–63. doi: 10.1093/ajh/3.2.156. [DOI] [PubMed] [Google Scholar]
- 5.Charlet J, Bachimont B, Jaulent M. Building medical ontologies by terminology extraction from texts: an experiment for the intensive care units. Comput Biol Med. 2006;36:857–870. doi: 10.1016/j.compbiomed.2005.04.012. [DOI] [PubMed] [Google Scholar]
