Abstract
A web-based virtual library of peer-reviewed radiological images was created for use in education and clinical decision support. Images were obtained from open-access content of five online radiology journals and one e-learning web site. Figure captions were indexed by Medical Subject Heading (MeSH) codes, imaging modality, and patient age and sex. This digital library provides a new, valuable online resource.
Background
The overall goal of this project was to create a digital library of radiological images that could be accessed readily for education and clinical decision making. By indexing the captions of figures in the radiological literature, the image library would provide information about the images that is more granular than indexing by PubMed. Images in many teaching files are often indexed only by textual keywords, and are not indexed for retrieval by controlled vocabulary, such as MeSH. We also sought to identify images by imaging modality and patient age.
Materials and Methods
We incorporated open-access content from five leading peer-reviewed radiology journals. Several large radiology societies – including the American Roentgen Ray Society, the American Society of Neuroradiology, the British Institute of Radiology, and the Radiological Society of North America – make the content of their journals available through the Web 12 to 24 months after publication. All of the selected journals are written in English and are hosted online by HighWire Press, a division of Stanford University Libraries. We included content from the European Association of Radiology’s EURORAD E-Learning Initiative, which comprises more than 1900 peer-reviewed case reports with high-quality images. Data were stored in a MySQL database (version 4.1; MySQL AB, www.mysql.net). Software was written in the PHP programming language.
A web robot was created to harvest figure captions from these online sources. For each article, the system recorded the title, journal, uniform resource locator (URL) of the full-text online article, and digital object identifier (DOI) if available. For journal articles, the system captured the PubMed identifier (PMID). We obtained Medical Subject Heading (MeSH) codes from Medline using the National Library of Medicine’s eQuery and eFetch web-based utilities. MeSH codes assigned by EURORAD to index its content were captured by the harvesting software.
The patient’s age and sex were parsed from the figure caption and stored in the database. The system determined the imaging modality based on the appearance in the caption of a set of words shown to appear with high frequency. The National Library of Medicine’s MetaMap Transfer (MMTx) software was used to map the captions’ unstructured text to concepts in the UMLS Metathesaurus. A simple Web-based user interface was created to facilitate searching. Search results could be filtered by imaging modality, age group, and/or sex.
Results
We collected a total of 10,766 articles and 82,566 figures from the six online sources. Images were classified by imaging modality based on their captions in 83.3 percent of cases. Photographs and graphics (charts, drawings, and other illustrations) comprised 4.4 percent of the collection. The patient’s age and/or sex were identified for 60.8 percent of the images in the collection based on information in the figure caption.
Discussion
The search interface provides an easy-to-use tool for access to a large pool of images and associated text. The image library can be searched by concept and by keyword. Users can limit their search by imaging modality and patient age and sex. We also plan to apply the RadLex vocabulary for radiology, currently under development, to further index our database’s content.
Acknowledgment
This work was supported in part by the American Roentgen Ray Society
