Abstract
Integrating relevant images into web-based information resources adds value for research and education. This work sought to evaluate the feasibility of using “Web 2.0” technologies to dynamically retrieve and integrate pertinent images into a radiology web site. An online radiology reference of 1,178 textual web documents was selected as the set of target documents. The ARRS GoldMiner™ image search engine, which incorporated 176,386 images from 228 peer-reviewed journals, retrieved images on demand and integrated them into the documents. At least one image was retrieved in real-time for display as an “inline” image gallery for 87% of the web documents. Each thumbnail image was linked to the full-size image at its original web site. Review of 20 randomly selected Collaborative Hypertext of Radiology documents found that 69 of 72 displayed images (96%) were relevant to the target document. Users could click on the “More” link to search the image collection more comprehensively and, from there, link to the full text of the article. A gallery of relevant radiology images can be inserted easily into web pages on any web server. Indexing by concepts and keywords allows context-aware image retrieval, and searching by document title and subject metadata yields excellent results. These techniques allow web developers to incorporate easily a context-sensitive image gallery into their documents.
Key words: Web technology, multimedia, internet technology
Introduction
The Internet has opened new avenues for creating, integrating, and distributing medical knowledge. Many of the recent innovations—termed “Web 2.0”—seek to integrate existing web resources and make them more dynamic, collaborative, and interactive1–3. Integrating relevant images into web-based information resources adds value for research and education. The present work tested the feasibility of a simple, automated, context-sensitive platform to integrate relevant radiology images into a target web page’s content. There were two specific aims: (1) to automatically integrate relevant images based on the content of the target document, and (2) to make the process virtually transparent from the standpoint of the web site developer.
Materials and Methods
An online radiology reference, CHORUS (Collaborative Hypertext of Radiology; http://chorus.rad.mcw.edu), was selected as the collection of target documents. The 1,178 textual documents in this web-based collection describe diseases, radiological findings, and pertinent anatomy4. Each CHORUS document included Dublin Core metadata elements5, including the “title” and “subject” metadata. The subject metadata were assigned by the SAPHIRE system6 and had been incorporated into the CHORUS web site for more than 10 years as part of an effort to improve medical document retrieval using metadata7. The subject metadata terms were chosen from the Medical Subject Headings (MeSH®) vocabulary developed by the US National Library of Medicine (NLM) to index the biomedical literature in MEDLINE and PubMed. For example, the CHORUS document on optic nerve glioma incorporated the metadata (“META”) tags:
<META NAME="DC.title" CONTENT="optic nerve glioma"> |
<META NAME="DC.subject" CONTENT="(SCHEME=MeSH) Cranial Nerve Neoplasms"> |
American Roentgen Ray Society (ARRS) GoldMiner™ (http://goldminer.arrs.org), a radiology image search engine, was employed to identify pertinent images. As of February 2008, GoldMiner incorporated 176,386 images from open-access articles in 228 online peer-reviewed journals. The images were indexed by keywords and by concepts from the NLM’s Unified Medical Language System (UMLS) Metathesaurus® (http://umlsinfo.nlm.nih.gov)8,9. Concepts were extracted from the unstructured text of figure captions using the NLM’s MetaMap Transfer® (http://MMTx; mmtx.nlm.nih.gov) software10. A web-based search engine interface provided access to the GoldMiner image library11.
The web server software scanned each target document to identify the document’s title and topic metadata. The title text was used as a query string; if no matching images were found, then the subject metadata string was used. Up to four images were selected. The system precompiled Hypertext Markup Language (HTML) text to display the images and their captions. The system stored that text string in the database and indexed it by the uniform resource locator (URL) of the referring web page. In this way, the server needed only to search for the referring page’s URL and to return the corresponding precompiled HTML text string.
Preliminary analysis was conducted to assess the relevance of the retrieved images to the target documents. A strict definition of “relevance” was applied. It was not sufficient for a figure caption to include one or more words from the query term; a retrieved image was considered relevant only if it displayed the imaging finding, anatomy, or disease described in the title or subject of the document. For example, an image whose caption included the words “renal” and “calculi” might not be considered relevant to a document on renal calculi unless the image showed the requisite finding. The author manually reviewed the relevance of the retrieved images was for 20 randomly selected CHORUS documents.
The CHORUS documents were modified on their server to insert a single line of HTML text into each document.
<SCRIPT TYPE="text/javascript" SRC="http://goldminer.arrs.org/inline.php"> |
</SCRIPT> |
This command invoked a server-side program that returned the precompiled JavaScript code to incorporate up to four images as an “image gallery” within the document. This image gallery was inserted “inline” into the web document; that is, the images were incorporated in real-time into the document, rather than being viewed through a link to a separate web page. When the server detected a known referring web page, it returned the HTML instructions to display a set of images within the body of the referring document. The command line did not, itself, include any parameters to specify the search; the GoldMiner system determined which document had called the command by examining the HTTP_REFERER parameter, which specifies the URL of the referring document. This approach greatly simplified the task of modifying the target documents to include the special JavaScript command. The web server was based on open-source technologies: the software was written in the PHP programming language (version 5.1.2; www.php.net), and used the MySQL relational database management system (version 5.0; MySQL AB; www.mysql.org).
Results
The client- and server-side scripts functioned properly and successfully integrated the GoldMiner Image Gallery into the target documents when images were identified. At least one potentially relevant image was retrieved in real-time for display for 1,081 (92%) of the 1,178 CHORUS documents. For 1,015 CHORUS documents (86%), four or more images were identified; a maximum of four images were displayed.
Of the 20 CHORUS documents reviewed, two had no images; the other 18 documents each included four displayed images. Of the 72 images displayed, 69 were judged to be relevant to the target document; hence, the estimated precision was 96% (95% confidence interval, 91–100%). For example, of four images retrieved for the CHORUS document on “spleen cyst,” three showed splenic cysts and were counted as relevant. The fourth image was not considered relevant: it did not show a splenic cyst even though its caption contained the words “spleen” and “cyst”. For the CHORUS document on “bagassosis,” although the four retrieved images did not contain that word in their captions, all four were considered relevant because they showed hypersensitivity pneumonitis (of which bagassosis is a subtype) and were pertinent to the document’s MeSH subject term, “Alveolitis, Extrinsic Allergic.”
The image gallery included JavaScript commands to display an abstract of the figure caption when a user hovered over an image. (Fig. 1). Each thumbnail image was linked to the full-size image at the original web site (Fig. 2). Users could click on the “More” link to search the image collection more comprehensively and, from there, link to the full text of the article.
The ARRS GoldMiner server retrieved images from its database using a proprietary algorithm that matched the target document’s metadata with words and concepts from figure captions and article titles in GoldMiner's database. The MeSH vocabulary is one component of the medical knowledge model used to index the GoldMiner collection, so these terms provided a natural bridge between the CHORUS documents and GoldMiner images. Such direct matching is helpful, but not required, as GoldMiner can search using arbitrary terms.
Discussion
This work demonstrates the feasibility of real-time integration of text with potentially relevant images for education and clinical decision support. A gallery of relevant radiology images can be inserted easily into web pages on any web server. Indexing by concepts and keywords allows context-aware image retrieval, and searching by document title and subject metadata yields excellent results. The use of a single, uniform line of HTML code makes it extremely easy for web developers to incorporate an image gallery into their documents. The current application represents an example of a web “mashup,” a hybrid application that combines data from more than one source13.
This system updates the displayed images automatically without intervention by the owner of the source web documents. The GoldMiner server periodically crawls the target web pages to check for updated content and also recompiles the code that displays the inlined images. Thus, as the image library is updated, the most recent relevant images are selected. Precompilation also allows the server to operate with maximal efficiency. For example, more than 15,000 CHORUS documents are viewed daily. Precaching the search results reduces overhead on GoldMiner’s database system; it is far more efficient to retrieve a precompiled HTML string than to execute a potentially complex database query each time a set of inlined images is requested.
GoldMiner retrieved images based on the appearance of the query words or related concepts within the associated figure caption. Preliminary analysis of GoldMiner’s search results has shown very high precision: of the images retrieved, 88% included the words or concepts queried14. Using strict criteria, this study found the retrieved images to be highly relevant to their target documents.
GoldMiner also can retrieve a set of images in real-time using a user-specified query string. For example, the specification:
<SCRIPT TYPE="text/javascript" |
SRC="goldminer.arrs.org/inline.php?query=kidney+stones"> |
</SCRIPT> |
generates a GoldMiner search for images that match the keywords and/or concept of “kidney stones.” This approach offers additional flexibility and allows developers to integrate ad-hoc search queries into a web page.
The system can integrate relevant images on demand using Asynchronous Javascript and XML (AJAX) technologies. AJAX is a group of interrelated techniques used to create interactive web applications15. AJAX typically combines: (1) XHTML or HTML and Cascading Style Sheets to define how information is to be displayed; (2) a client-side scripting language, such as JavaScript or JScript, to dynamically present and modify the web page’s Document Object Model; and (3) a mechanism, typically the XMLHttpRequest object, to exchange data asynchronously with the web server. The goal of AJAX technologies is to the client (browser) software to interact “behind the scenes” with the web server to make the web application more responsive and interactive. For example, AJAX has been used to develop PubMed Interact, which allows users to refine search parameters and interact with the search results to retrieve and display relevant information16.
An overarching theme of Web 2.0 is to facilitate collaboration through web technology17. Successful approaches “afford users the added advantage of reducing the technical skill required to use these features by allowing users to focus on the information and collaborative tasks themselves.”1 A goal is to make the technology “transparent,” so that the user or web author can concentrate on the subject matter, rather than on the technological platform.
Several “Web 2.0” applications in radiology seek to provide interactive, collaborative web-based resources. A wiki is a website that allows visitors to add, remove, and edit content; the most famous wiki is the online encyclopedia Wikipedia (http://wikipedia.org). RadiologyWiki (www.radiologywiki.org)18, RadsWiki (www.radswiki.net), and Radiopaedia (www.radiopaedia.org) are example of multi-author information resources in radiology. ClubPACS (http://clubpacs.com)19 and Women’s Imaging Online (http://womensimagingonline.arrs.org) provide web-based communities of practice for PACS administrators and women’s imaging professionals, respectively.
The use of semantic information, such as that provided by the UMLS Metathesaurus, can improve navigation within a web-based network of images by relating medical concepts20,21. The UMLS Metathesaurus is critical to GoldMiner’s ability to retrieve images not only by matching strings of text but also to understanding the underlying medical concepts. The MeSH terms form one component of the UMLS Metathesaurus. The RadLex vocabulary (http://radlex.org)22 promises to be an important addition to promote the uniform indexing of radiology content. The American Roentgen Ray Society is actively continuing the development of ARRS GoldMiner™ and provides this service freely on the Internet. Policies and instructions for those who wish to incorporate the ARRS GoldMiner Image Gallery into their web pages are available at the ARRS GoldMiner web site.
References
- 1.Boulos MN, Maramba I, Wheeler S. Wikis, blogs and podcasts: a new generation of Web-based tools for virtual collaborative clinical practice and education. BMC Med Educ. 2006;6:41. doi: 10.1186/1472-6920-6-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Giustini D. How Web 2.0 is changing medicine. BMJ. 2006;333:1283–1284. doi: 10.1136/bmj.39062.555405.80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kamel Boulos MN, Wheeler S. The emerging Web 2.0 social software: an enabling suite of sociable technologies in health and health care education. Health Info Libr J. 2007;24:2–23. doi: 10.1111/j.1471-1842.2007.00701.x. [DOI] [PubMed] [Google Scholar]
- 4.Kahn CE Jr: CHORUS. a computer-based radiology handbook for international collaboration via the World Wide Web. RadioGraphics. 1995;15:963–970. doi: 10.1148/radiographics.15.4.7569141. [DOI] [PubMed] [Google Scholar]
- 5.Powell A: Expressing Dublin Core in HTML/XHTML meta and link elements. <http://dublincore.org/documents/dcq-html/>. Dublin Core Metadata Initiative, 2003. Accessed 17 Feb. 2007
- 6.Hersh WR, Greenes RA. SAPHIRE—an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships. Comput Biomed Res. 1990;23:410–425. doi: 10.1016/0010-4809(90)90031-7. [DOI] [PubMed] [Google Scholar]
- 7.Malet G, Munoz F, Appleyard R, Hersh W. A model for enhancing Internet medical document retrieval with “medical core metadata”. J Am Med Informatics Assoc. 1999;6:163–172. doi: 10.1136/jamia.1999.0060163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Humphreys BL, Lindberg DAB, Schoolman HM, Barnett GO. The Unified Medical Language System: an informatics research collaboration. J Am Med Informatics Assoc. 1998;5:1–11. doi: 10.1136/jamia.1998.0050001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lindberg DAB, Humphreys BL, McCray AT. The Unified Medical Language System. Methods Inf Med. 1993;32:281–291. doi: 10.1055/s-0038-1634945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001;2001:17–21. [PMC free article] [PubMed] [Google Scholar]
- 11.Kahn CE, Jr, Thao C. GoldMiner: a radiology image search engine. AJR Am J Roentgenol. 2007;188:1475–1478. doi: 10.2214/AJR.06.1740. [DOI] [PubMed] [Google Scholar]
- 12.Takanashi J, Tada H, Barkovich AJ, Saeki N, Kohno Y. Pituitary cysts in childhood evaluated by MR imaging. AJNR Am J Neuroradiol. 2005;26:2144–2147. [PMC free article] [PubMed] [Google Scholar]
- 13.Cho A. An introduction to mashups for health librarians. J Can Health Lib Assoc. 2007;28:19–22. [Google Scholar]
- 14.Kahn CE Jr, Thao C, Rubin DL: Concept discovery in radiology figure captions: application of MetaMap Transfer (MMTx) to the ARRS GoldMiner database. In: Radiological Society of North America, Chicago, IL, 2007
- 15.Garrett JJ: Ajax: A New Approach to Web Applications. <http://www.adaptivepath.com/ideas/essays/archives/000385.php>. Adaptive Path, 2005. Accessed 28 January 2008
- 16.Muin M, Fontelo P, Ackerman M. PubMed Interact: an interactive search application for MEDLINE/PubMed. AMIA Annu Symp Proc. 2006;2006:1039. [PMC free article] [PubMed] [Google Scholar]
- 17.O'Reilly T: What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. <http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html>. O'Reilly Media, Inc., 2005. Accessed 14 Dec. 2007
- 18.Streeter JL, Lu MT, Rybicki FJ. RadiologyWiki.org: the free radiology resource that anyone can edit. RadioGraphics. 2007;27:1193–1200. doi: 10.1148/rg.274065090. [DOI] [PubMed] [Google Scholar]
- 19.Nagy P, Kahn CE, Jr, Boonn W, et al. Building virtual communities of practice. J Am Coll Radiol. 2006;3:716–720. doi: 10.1016/j.jacr.2006.06.005. [DOI] [PubMed] [Google Scholar]
- 20.Frankewitsch T, Prokosch U. Navigation in medical Internet image databases. Med Inform Internet Med. 2001;26:1–15. doi: 10.1080/14639230010013971. [DOI] [PubMed] [Google Scholar]
- 21.Lowe HJ, Antipov I, Hersh W, Smith CA. Towards knowledge-based retrieval of medical images.The role of semantic indexing, image content representation and knowledge-based retrieval. Proc AMIA Symp. 1998;1998:882–886. [PMC free article] [PubMed] [Google Scholar]
- 22.Langlotz CP. RadLex:a new method for indexing online educational materials. RadioGraphics. 2006;26:1595–1597. doi: 10.1148/rg.266065168. [DOI] [PubMed] [Google Scholar]