Skip to main content
The BMJ logoLink to The BMJ
. 2008 Jul 12;337(7661):70. doi: 10.1136/bmj.a742

Internet crawler uses unconventional information sources to track infectious disease outbreaks

Susan Mayor 1
PMCID: PMC2453263  PMID: 18614514

An automated data gathering system that crawls the internet to gather information from non-traditional sources such as online news outlets, discussion forums, and government websites is proving effective in tracking emerging infectious diseases, says a new study (PLoS Med 2008:5:e151 doi: 10.1371/journal.pmed.0050151).

Researchers from the Children’s Hospital Boston and Harvard Medical School developed HealthMap as a freely accessible and automated system that monitors and organises information on emerging diseases in real time.

“Web-based sources can play an important role in early event detection . . . by providing current, highly local information about outbreaks, even from areas relatively invisible to traditional global public health efforts,” they wrote.

The existing network of traditional surveillance systems managed by health organisations and multinational agencies has wide gaps in geographical coverage and often suffers from poor information flow across national borders, they say.

“At the same time,” explained the study’s lead author, John Brownstein, assistant professor of paediatrics at the Boston Children’s Hospital and Harvard Medical School, “an enormous amount of valuable information about infectious diseases is found in web accessible information such as discussion sites, disease reporting networks, and news outlets.”

Although these new sources are potentially useful, triggering most outbreak verifications now carried out by the World Health Organization, it can be difficult to cope with the volume of information and to distinguish “signal from noise.”

HealthMap continually collects reports of new and ongoing outbreaks of infectious disease and then uses software similar to spam filters to integrate and filter the information to provide online summaries.

It currently gathers reports from 14 sources, including Google News and expert discussion sites, which summarise information from more than 20 000 different websites. The search criteria include disease names, symptoms, and keywords. The system collects an average of 300 reports a day, most of which (85%) come from news media sources. The articles are analysed for duplication and content. Duplicate articles are removed, while those that discuss new information about an ongoing situation are integrated with other relevant articles and added to an interactive map.

New data based on an evaluation of HealthMap over 43 weeks from 1 October 2006 to 18 July 2007 showed that reports on a wide variety of pathogens were detected, with information on 141 unique infectious disease categories reported through the Google News feed alone. The frequency of reports about particular pathogens was related to the direct or potential economic and social disruption rather than the associated morbidity or mortality. The greatest numbers of reports were for avian influenza (877) and Escherichia coli (733), followed by salmonella (479).

Over the study period reports of outbreaks of infectious disease occurred in 174 countries, with the greatest number from the United States (4351 reports), the United Kingdom (1018), Canada (880), and China (737). A clear bias was shown towards greater reporting from countries with more media outlets, more developed public health resources, and greater availability of electronic communication.

The research group is now developing ways to improve coverage. In particular they want more information from Africa and South America, which have the highest risk and burden of emerging infectious diseases. To achieve this they are looking at monitoring other internet sources, such as blogs, discussion sites, and listservs (automated email forwarding systems that allow any member of a group of people to email all other members).

“We are also developing contacts with people in developing countries to provide further information,” said Clark Freifeld, a research software developer at the Boston Children’s Hospital and Harvard Medical School.

Comparing HealthMap with reports of emerging outbreaks from existing agencies has shown its validity. Examples include a recent outbreak of salmonella associated with tomatoes in the US, which was tracked by an increase in news reports of gastrointestinal disease in New Mexico, and the case of a UK teacher who contracted tuberculosis in Hong Kong.

The project is being funded by Google.org, the philanthropic arm of Google.

HealthMap is at www.healthmap.org.

Cite this as: BMJ 2008;337:a742


Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Publishing Group

RESOURCES