Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2005;2005:814–818.

Medical Textbook Summarization and Guided Navigation using Statistical Sentence Extraction

Gregory Whalen 1
PMCID: PMC1560740  PMID: 16779153

Abstract

We present a method for automated medical textbook and encyclopedia summarization. Using statistical sentence extraction and semantic relationships, we extract sentences from text returned as part of an existing textbook search (similar to a book index). Our system guides users to the information they desire by summarizing the content of each relevant chapter or section returned through the search. The summary is tailored to contain sentences that specifically address the user’s search terms. Our clustering method selects sentences that contain concepts specifically addressing the context of the query term in each of the returned sections. Our method examines conceptual relationships from the UMLS and selects clusters of concepts using Expectation Maximization (EM). Sentences associated with the concept clusters are shown to the user. We evaluated whether our extracted summary provides a suitable answer to the user’s question.

Introduction

Textbooks aim to balance comprehensive content with ease of use. As content is added, it becomes difficult for the writer and publisher to group information about a particular topic to one specific set of pages. While heavily cross-referenced content is desirable for completeness and readability, it makes finding information relevant to a particular question difficult. Readers typically rely on an index search or online query to return all pages broadly relevant to the user’s query.

We propose a method for summarizing content across several books, chapters or pages that are retrieved via some external textbook search. Most textbook searches, like a simple index search, return a simple list of chapters or sections that might contain information relevant to both your search term and its context. The user often explores each section to determine whether its content is relevant. Since most search engines will return pages of interest (those containing words matching or related to the query word), our method provides an additional layer of guidance that aims to extract text directly answering the user’s question or providing specific information about the context of the queried term in each of the sections. The system provides a summary of the content across different pages, chapters, and books with the goal of providing content to the user. If the system fails to retrieve information that answers the user’s underlying question, the summary will guide the user to the appropriate section by extracting sentences relevant to the term’s context in each section.

The methods presented in this paper are knowledge-rich; input text is distilled to a conceptual representation rather than a strictly syntactic one using an external knowledge-base. Our abstraction technique relies on a network of concepts to extract sentences that are most likely to indicate the context of the queried terms(s). It extracts groups of sentences that are often paragraphs or contiguous sub-paragraphs. While our approach does identify individual sentences of interest, surrounding sentences are usually included in the extraction, even if their information content is relatively low. As described in the following sections, proximity to sentences of high initial interest is an important component of the term vector.

This method attempts to balance indicative and informative qualities. Our method is essentially an indicative method – we aim to extract sentences that should be most representative of the queried term. However, our clustering technique attempts to separate different senses or usages of each term and select clusters of sentences across all senses. This content planning step aims to deliver sentences that very closely describe the queried term. This can be viewed as an informative feature of an indicative summarization method.

This work’s key contribution is using values from MetaMap as inputs to Expectation Maximization. EM then clusters terms and we extract sentences for each cluster. We exploit the statistical model that EM produces to allow the merging of concept clusters. This approach is new in Natural Language Processing as is using MetaMap values to control clustering.

Past work has used UMLS concept networks and MetaMap as a guide for content planning1. This paper presents a new and novel approach to using a concept network for content planning and sentence extraction. While it is easy to locate a search term in text, it is difficult to search for a specific sense or context of the term. Our method uses Expectation Maximization to cluster concepts into related groups – each representing a distinct usage of the term across the searched text.

Materials & Methods

UMLS

The Unified Medical Language System2 is a collection of natural language knowledge sources tailored to the medical domain. MetaMap3, a tool for tagging input text with semantic data, makes use of the three core resources included in the UMLS. This includes the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon. MetaMap combines text parsing and part-of-speech tagging with UMLS concept identification, semantic grouping, and hierarchical information from the Metathesaurus.

Overview & Sources

Our implementation works with Harrison’s Online4 and other electronic resources provided through the Columbia University Health Sciences Library. We use the search feature included in the online edition to return relevant sections and chapters related to our search term. Figure 1 outlines this process.

Figure 1.

Figure 1

System Overview

In the control or usual case, a user’s query produces a list of sections that might contain the desired information (Standard Results). The user makes an informed guess about where the information may be found and starts exploring the content.

Our system (Experimental Results) uses the same query interface and dispatches the term(s) to the Harrison’s search system. However, our experimental case retrieves all content (HTML) returned from the query and passes it to MetaMap using the web services batch API. Once the results are returned, our sentence extraction program generates summaries and adds them to the original search results interface immediately below each link. MetaMap could be run once ahead of time across all pages in the resource collection since MetaMap is not affected by the query. Our system calls MetaMap dynamically, which adds time to the experimental flow. Time taken to run MetaMap is not included in the evaluation since it could easily be eliminated in a production version of the software. The tagging operations performed by MetaMap can easily take 3–5 minutes per 25-page chapter/section.

Our implementation also works across multiple sources and has been tested with Harrison’s, other line resources, and offline content such as the Textbook of Cardiovascular Medicine (Topol) donated by Ovid.

Algorithm

At a high level, our method consists of the following high-level tasks:

  1. Leverage an existing textbook or e-resource search to provide a list of relevant chapters/sections of material possibly relevant to one’s search.

  2. Attach concept, part-of-speech, and semantic identifiers to all UMLS terms in the text. MetaMap performs this task.

  3. Select all words sharing the same concept identifier as one or more of the search words. This set contains the “seed” nodes in the graph. Compute semantic relationships between these words and all other terms in the text, forming a network of concept nodes connected by semantic relationships.

  4. Enrich semantic relationships with additional metrics (see metrics section below), forming a weighted network linked by vectors. Vectors are computed for each concept by taking the average (mean) of its connecting link vectors. Note that each “seed” term may have a distinct vector after this process of averaging completes.

  5. Run Expectation Maximization across the network space to group concepts. Each concept is represented by a vector. We model the space as a mixture of Gaussian distributions with one initial Gaussian rooted at the point of each “seed” term. The vector creation process places related or supporting terms very close to the associated seed term. As EM runs, it fits each one to the space. We take all points in the top 25th percentile of each Gaussian and consider the containing sentences as an extracted summary of one sense of the word.

  6. Each concept group’s extracted sentences are presented to the user with links directly to that cluster as realized in the text. If two sentences are extracted yet have an information-poor sentence between them in the text, a final pass will include that sentence to enhance the readability of the extraction.

Step 1 uses the Harrison’s online resource mentioned above. Any search that returns several possibly correct candidates can be enhanced by this process.

Step 2 uses the NLM’s MetaMap implementation to transform plain textbook chapter text into a conceptual condensate – a network of UMLS concepts found in the text connected by relationships from the UMLS Semantic Network. Text in the chapter is unified with UMLS terms. Each term is tagged with a part of speech and a concept identifier from the MetaMap implementation.

Step 3 is necessary since it is infeasible to compute relationships between every pair of identified concepts in the text. Given a text containing n words, this requires n2 MetaMap lookups, which would only be possible if we ran MetaMap over the source text ahead of time. During some of our trials, we implemented a random decay function based on word distance. For each “seed” concept in the text, we compute links between it and all other nodes in the text based on a probability function. The probability of including a distant concept in the network decays as the word distance between the seed term and other term grows.

Step 4 adds additional information to the semantic links. Each semantic link is assigned a strength number that was created by the authors. This selection is a form of rule-based content planning that favors relationships we see as important to include in a summary. For example, we assign a high weight to cause/effect relationships. In addition to the semantic strength, each link has 2 other metrics associated with it, resulting in a three dimensional vector. In addition to semantic strength, we include the inverse of distance (1 / #words between concepts) and part of speech compatibility (we match part of speech patterns allowing us to contribute positively or negatively to this value in the vector). Our metric consists simply of using these three values to represent the points given as input to EM.

Step 5 takes the three dimensional space defined by the link vectors and groups them by fitting them to a mixture of Gaussian distributions. In essence, this plots the concepts in the text based on how close they are to the “seed” concepts according to the chosen metrics.

Step 5 is also responsible for defining the boundary between the concepts that we would like to include in a summary and those we would not. By assuming that the three dimensional space is a mixture of Gaussian distributions, the EM algorithm iterates to a set of suitable parameters for the Gaussian mixture model. Since hierarchy is only one aspect of the semantic relationships used in the vector above, this yields concepts that are spatially separate from each other.

Step 6 selects each found cluster in the concept space and retrieves the sentences that encompass those corresponding concepts. A few sentence selection rules steer this process into pulling mostly whole paragraphs from the text. “Glue” sentences are extracted to ensure a block of readable text is selected in an area where several sentences are of interest. Each term maintains its relationship to the original text, so linking back to original sources is simple.

EM

We use the Expectation Maximization5 method to estimate hidden parameters in a Gaussian mixture model. In other words, we assume that the space above can be modeled as a mixture of Gaussian distributions. Each term is a point in three-dimensional space; each term influences the vectors of surrounding terms. Our hypothesis is that highly related terms will form areas of high density in the space. EM is then used to locate these areas of high density by finding a mixture model that captures it. We consider each discovered Gaussian as a distinct component of an extracted summary. In other words, we assume that the following probabilistic model holds for our data set:

p(xΘ)=i=1Maipi(xθi)

In the model above, each p is a Gaussian density function parameterized by a.θ. Each a is a mixture parameter (between 0 and 1; all a’s summing to 1). The variable x is the vector described in the previous algorithm. EM iterates to a Θ such that concepts distinct but highly related to the search concept (we call these “seeds”) are assigned a probability of 1 (for one particular Gaussian in the mixture) and we use EM to find a model of Gaussians that fit the rest of the data. Each Gaussian forms a ring around related concepts, centered closely (usually) to a “seed” term.

Results

Our existing summarizer implementation runs live in conjunction with Harrison’s Online. We evaluated the implementation in a controlled experiment to see if it helped medical professionals (students, residents, and fellows) locate answers to their questions more quickly than using the standard search interface. Users were allowed to query the system in any way they were familiar with, but all users chose a simple one or two-word standard query.

Eight test subjects performed one experimental and one control task for a total of sixteen tests. Each user was asked to pose their own question about a given topic. Our list of topics included broad terms such as “mitral regurgitation”, “stenosis”, and “angina”. We constrained topics to anything covered by the “Disorders of the Cardiovascular System” (24 chapters) since our implementation had only been tested on portions of the Harrison’s Online reference. Once the subject formulated the question, he or she queried Harrison’s Online. All users were already familiar with the use of Harrison’s Online. Users recorded their question ahead of time on an evaluation sheet. Figure 2 presents an example of a user-posed question, their chosen query and the corresponding results provided by the experimental system.

Figure 2.

Figure 2

Evaluation Example

Since each user was performing both a control test (no summary) as well as an experimental one (summary), four of the subjects performed the experimental test first while the others performed the control test first. Subjects were given different topics to formulate a question between these two tests. This process avoids a subject taking less time in one of the cases because he or she had already viewed the material in the previous test, but prevents us from using a t-test to prove a statistical difference between each user’s trials. By staggering the order of the experimental and control tests, we also avoid order bias. We noted that the second test (regardless of whether it was a control or experimental test) was often slightly shorter than opposing tasks. This process should mitigate the problem.

After querying Harrison’s Online, the experimental group received an extracted summary of the information available from the content while the control group received the standard search results. In most cases, experimental subjects received ten to twenty sentences extracted from key areas of the search results with links directly to the relevant chapter. Control subjects received a list of chapters with description that might contain information about their term and question.

Immediately after the query, users were timed until they felt their question had been answered. We did not include time to run MetaMap in this process since UMLS tagging could be performed ahead of time in a production system. The sentence extraction system was included in the time and adds a small amount of upfront time to the experimental case. We measured the total time in addition to the number of page views (reported as “clicks”) to get to the point of satisfaction. In the experimental group, the subject could say his or her question was answered immediately after viewing the summary sentences or after viewing any number of the deep links embedded in the summary.

One particular danger in summarization is giving the user incomplete information while presenting it as otherwise. While we don’t have a method to safeguard against this in our summarization methods, we did test for this during our evaluation. If our users were falsely satisfied after reading our summary, our hope was to detect it by forcing them to read additional material afterward and tell us if the additional content added any value in answering their question. In this case, the total time and click metrics would have been extended to cover this additional time, but fortunately no subjects felt that the forced additional content made any difference in answering their question. Most subjects followed at least one link from the summary; only one experimental subject had his question answered directly by the summary. This subject did not see any value in being forced to read additional content.

On average, subjects took six seconds less (Table 1) to find their desired information and viewed one less page. Since we use time difference as our metric and stagger the experimental and control cases, this is an accurate way of seeing if users do need less time to find answers to their questions. All subjects took less time using our system. All page requests ran through the exact same Harrison’s Online web application.

Table 1.

User Interaction Metrics

Users Terms Searched (16 Total; unique) Average Time Taken Average Clicks
Standard Textbook Search 8 Total (Shared) 8 21 seconds 4
Search with Summarization 8 15 seconds 3

Users were required to write an answer to their initial question on their evaluation sheet. While this was the only check we performed, we are confident that our subjects would only write an answer on the paper that was accurate and reasonably complete. All users were able to find answers to their questions. They were familiar with the content of Harrison’s but had little knowledge of its content organization. Responses were informally reviewed after the evaluation to ensure that no one user was clearly allowing more or less lenient answers from either the control or experimental case.

Discussion

We have essentially replaced the Harrison’s Online’s search results with an enhanced version that supports summarization. Our results show that users find information more quickly using our summarizing results page than they do using the standard chapter and section list returned otherwise. Our results also show that our summary does not mislead or otherwise convince our subjects that they’ve found a premature answer to their question.

In some cases, our extraction method produced a summary consisting of non-contiguous sentences. While this was useful, our current method does not perform any sentence transformation to make the resulting abstract readable. We intend to enhance this program using transformation methods similar to those used in Barzilay, et al6.

This work builds on past statistical methods for indicative summarization using an existing search index7. Our aim was to add informative characteristics by carefully planning content using semantic network methods presented in past work.

Conclusions

We have presented a summarization system that uses extraction to aid the user in finding his or her desired information more quickly. The user is presented with an enhanced interface that shows carefully extracted text from each point in the resource that might contain the desired information. Our methods follow from existing statistical extraction techniques, but we use these methods in conjunction with a semantic network. The EM algorithm is acting as a content planner to guide sentence selection based on clusters of concepts computed using our metrics. We evaluated our system and showed that users can access information more quickly with our system while dispelling the notion that users of our summaries consider themselves prematurely satisfied with incomplete information.

Acknowledgements

This paper is based upon work supported by the National Science Foundation under Digital Library Initiative Phase II Grant No. IIS-98-17434. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

  • 1.Fiszman M, Rindflesch T, Kilicoglu H. Summarization of an online medical encyclopedia. Medinfo. 2004:506–510. [PubMed] [Google Scholar]
  • 2.Humphreys BL, Lindberg DA, Schoolman HM, et al. The Unified Medical Language System: An informatics research collaboration. J Am Med Inform Assoc. 1998;5(1):1–11. doi: 10.1136/jamia.1998.0050001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. Proc AMIA Symp. 2001:17–21. [PMC free article] [PubMed] [Google Scholar]
  • 4.Harrison’s Online: Harrison's Principles of Internal Medicine, 16th Edition. Kasper DL, Braunwald E, Fauci A, Hauser S, Longo D, Jameson J, and Isselbacher K, Eds.
  • 5.Bilmes J. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. Technical Report, University of Berkeley, ICSI-TR-97-021, 1997.
  • 6.R. Barzilay, K. McKeown and M. Elhadad, Information Fusion in the Context of Multi-Document Summarization in Proc. of the 37th Association for Computational Linguistics, Maryland, 1999, pp550–557.
  • 7.Kan MY, Klavans JL, and McKeown KR. Using the annotated bibliography as a resource for indicative summarization. In Proceedings of the Language Resources and Evaluation Conference (LREC 2002), pages 1746–1752, 2002.

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES