Summary
Objectives : To introduce and summarize current research in the field of Public Health and Epidemiology Informatics.
Methods : The 2018 literature concerning public health and epidemiology informatics was searched in PubMed and Web of Science, and the returned references were reviewed by the two section editors to select 15 candidate best papers. These papers were then peer-reviewed by external reviewers to give the editorial team an enlightened selection of the best papers.
Results : Among the 805 references retrieved from PubMed and Web of Science, three were finally selected as best papers. All three papers are about surveillance using digital tools. One study is about the surveillance of flu, another about emerging animal infectious diseases and the last one is about foodborne illness. The sources of information are Google news, Twitter, and Yelp restaurant reviews. Machine learning approaches are most often used to detect signals.
Conclusions : Surveillance is a central topic in public health informatics with the growing use of machine learning approaches in regards of the size and complexity of data. The evaluation of the approaches developed remains a serious challenge.
Keywords: Public health, epidemiology, surveillance, medical informatics, International Medical Informatics Association, artificial intelligence
Introduction
As compared to 2017 literature analyzed in the Public Health and Epidemiology Informatics section of the International Medical Informatics Association (IMIA) Yearbook 1 , in addition to Precision Public Health or Digital epidemiology, a new term has appeared in 2018: infodemiology and infoveillance 2 3 4 . A large number of the papers published in Public Health informatics is about the epidemiological surveillance based on the new data generated in the current digital era. The papers include the analysis of the massive data from social media (leading to a so-called social sensor) or electronic health records (EHRs). The availability of this data has led to new opportunities to perform passive surveillance. However, this data requires organization to allow an architecture that makes it valuable.
The use of web-based data requires Natural Language Processing (NLP) approaches to extract the information. Electronic health records may also benefit from NLP but the key element is most often the integration of a large volume of structured and unstructured clinical data in a data warehouse. Several solutions are now widely used to construct a data warehouse such as i2b2 5 or Labkey 6 . Once the architecture for collecting data is ready, signals may be detected by machine learning approaches from standard statistical methods to neural networks. At this stage, the work is far from finished. Actually, a crucial step is the evaluation of the system proposed to perform surveillance. This evaluation is not straightforward as it requests a good reference (i.e. , a gold standard) which is rarely perfect and multi-sources of information are often used. Furthermore, the algorithm is most often dynamical because the system learns with the new real-time data collected, which may require repeated evaluation. Hence, as underlined in a systematic review 7 , the transfer from research to practice is not obvious, especially because of the challenges underlined above. Public health surveillance is therefore a natural application for artificial intelligence techniques but more work remains to be done by epidemiological scientists to evaluate digital surveillance systems.
Paper Selection
A comprehensive literature search was performed using two bibliographic databases, Pubmed/Medline (from NCBI, National Center for Biotechnology Information), and Web of Science ¯ (from Thomson Reuters). The search was targeted at public health and epidemiology papers that involve computer science or the massive amount of web-generated data. References addressing topics of other sections of the Yearbook, such as those related to interoperability between data providers were excluded from our search. The study was performed at the beginning of January 2019, and the search over the year 2018 returned a total of 805 references.
Articles were separately reviewed by the two section editors, and were first classified into three categories: keep, discard, or leave pending. Then, the two lists of references were merged, yielding 74 references that were retained by at least one reviewer or classified as “pending” by both of them. The two section editors jointly reviewed the 74 references and drafted an agreed upon list of15 candidate best papers. All pre-selected 15 papers were then peer-reviewed by both section editors and external reviewers (at least four reviewers per paper). Three papers 8 9 10 were finally selected as best papers (see Table 1 ). A content summary of these selected papers can be found in the appendix of this synopsis. The whole selection process has been described by Lamy et al. 11 .
Table 1. Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2019 in the section ‘Public Health and Epidemiology Informatics’. The articles are listed in alphabetical order of the first author’s surname.
Section Public Health and Epidemiology Informatics |
---|
▪ Arsevska E, Valentin S, Rabatel J, de Goër de Hervé J, Falala S, Lancelot R, Roche M. Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System. PLoS One 2018 Aug 3;13(8):e0199960. |
▪ Effland T, Lawson A, Balter S, Devinney K, Reddy V, Waechter H, Gravano L, Hsu D. Discovering foodborne illness in online restaurant reviews. J Am Med Inform Assoc 2018 Dec 1;25(1 2):1586-92. |
▪ Wakamiya S, Kawai Y, Aramaki E. Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study. JMIR Public Health Surveill 2018 Sep 25;4(3):e65. |
Outlook and Conclusion
As in 2017, papers published in 2018 and selected by the review were mainly on public health surveillance using EHRs and social media, mainly Twitter. Hospital databases are clearly a source for surveillance which is increasingly considered 12 13 14 . A good review 15 shows pilot studies of public health surveillance of chronic diseases and risk factors performed in several states of the United States. The topics were type 2 diabetes (based on hemoglobin A1c), pediatric asthma, amyotrophic lateral sclerosis, obesity, and smoking. All these studies constituted a valuable proof of concept but the challenges remain in the definition of the algorithms and their standardization across states and countries. Several papers were about surveillance of seasonal influenza using hospital databases 12 , 13 , 16 . When several sources were evaluated, it appeared that coupling standard surveillance systems with EHRs constituted the best approach at least for the surveillance of influenza in the United States 16 . It was better than influenza-related search engine and Twitter flu activity social media data 16 . In France, even the single EHR source gave results closer to the reference surveillance system (“Sentinelles” network) than Google data at the national and regional scale 12 .
It is interesting to see the spectrum of the outcomes that can be followed using hospital databases. Other published applications of the use of EHRs were the surveillance of antibiotic consumption 17 or hospital acquired infections 14 . Surveillance systems are also proposed to alert health care professionals on a real time basis. The impact of these surveillance techniques should be evaluated in various dimensions. How much is it informative? How valid is the alert? How does it change the practices of healthcare professionals? How does it improve the patient’s condition? Hence, the evaluation of a system used in an emergency department did not show any significant impact on the final clinical outcome: the incidence of death 18 .
Social media constitutes a source of information for the surveillance of various public health outcomes on a real-time basis such as influenza 9 , but also foodborne illnesses 10 , or heat alerts 19 , which may allow investigation or action in case of alerts, for instance in the context of mass gathering setting 19 . The exploitation of these data needs specific approaches to define the outcomes of interest according to the information available 9 , 10 and to extract the signal using various machine learning techniques 12 . Then, the approaches are evaluated by comparison with reference surveillance systems that are often weak gold-standards. The metrics used for these comparisons are usually correlation coefficients 4 , 9 , 12 , 13 . Other metrics such as sensitivity, specificity or accuracy, precision, and recall could be advantageously used in this context to provide a better evaluation of the validity of the surveillance tools 8 .
In conclusion, although fantastic opportunities are expected from these new information sources, a lot of work should be done to exploit and validate them.
Acknowledgements
We would like to thank the external reviewers for their participation in the selection process of the Public Health and Epidemiology Informatics section of the IMIA Yearbook.
References
- 1.Thiebaut R, Thiessard F. Public Health and Epidemiology Informatics. Yearb Med Inform. 2017;26(01):248–50. doi: 10.15265/IY-2017-036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res. 2009;11:e11. doi: 10.2196/jmir.1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Sciascia S, Radin M, Unlu O, Erkan D, Roccatello D. Infodemiology of antiphospholipid syndrome: Merging informatics and epidemiology. Eur J Rheumatol. 2018;5:92–5. doi: 10.5152/eurjrheum.2018.17105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mavragani A, Sampri A, Sypsa K, Tsagarakis K P. Integrating Smart Health in the US Health Care System: Infodemiology Study of Asthma Monitoring in the Google Era. JMIR Public Heal Surveill. 2018;4:e24. doi: 10.2196/publichealth.8726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Murphy S N, Mendis M E, Berkowitz D A, Kohane I, Chueh H C.Integration of clinical and genetic data in the i2b2 architecture. AMIA Annu Symp Proc 2006:1040. Available at:http://www.ncbi.nlm.nih.gov/pubmed/17238659[Accessed May 31, 2019] [PMC free article] [PubMed]
- 6.Nelson E K, Piehler B, Eckels J, Rauch A, Bellew M, Hussey P et al. LabKey Server: An open source platform for scientific data integration analysis and collaboration. BMC Bioinformatics. 2011;12:71. doi: 10.1186/1471-2105-12-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Charles-Smith L E, Reynolds T L, Cameron M A, Conway M, Lau E HY, Olsen J M et al. Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review. PLoS One. 2015;10:e0139701. doi: 10.1371/journal.pone.0139701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Arsevska E, Valentin S, Rabatel J, de Goër de Hervé J, Falala S et al. Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System. PLoS One. 2018;13:e0199960. doi: 10.1371/journal.pone.0199960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wakamiya S, Kawai Y, Aramaki E. Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study. JMIR Public Heal Surveill. 2018;4:e65. doi: 10.2196/publichealth.8627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Effland T, Lawson A, Balter S, Devinney K, Reddy V, Waechter H et al. Discovering foodborne illness in online restaurant reviews. J Am Med Inform Assoc. 2018;25:1586–92. doi: 10.1093/jamia/ocx093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lamy J B, Séroussi B, Griffon N, Kerdelhué G, Jaulent M C, Bouaud J. Toward a Formalization of the Process to Select IMIA Yearbook Best Papers. Methods Inf Med. 2015;54:135–44. doi: 10.3414/ME14-01-0031. [DOI] [PubMed] [Google Scholar]
- 12.Poirier C, Lavenu A, Bertaud V, Campillo-Gimenez B, Chazard E, Cuggia M et al. Machine Learning Methods: Comparison Study. JMIR Public Heal Surveill. 2018;4:e11361. doi: 10.2196/11361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bouzillé G, Poirier C, Campillo-Gimenez B, Aubert M L, Chabot M, Chazard E et al. Leveraging hospital big data to monitor flu epidemics. Comput Methods Programs Biomed. 2018;154:153–60. doi: 10.1016/j.cmpb.2017.11.012. [DOI] [PubMed] [Google Scholar]
- 14.Ehrentraut C, Ekholm M, Tanushi H, Tiedemann J, Dalianis H. Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting. Health Informatics J. 2018;24:24–42. doi: 10.1177/1460458216656471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Namulanda G, Qualters J, Vaidyanathan A, Roberts E, Richardson M, Fraser A et al. Electronic health record case studies to advance environmental public health tracking. J Biomed Inform. 2018;79:98–104. doi: 10.1016/j.jbi.2018.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ertem Z, Raymond D, Meyers L A. Optimal multisource forecasting of seasonal influenza. PLoS Comput Biol. 2018;14:e1006236. doi: 10.1371/journal.pcbi.1006236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schweickert B, Feig M, Schneider M, Willrich N, Behnke M, Peña Diaz L A et al. Antibiotic consumption in Germany: first data of a newly implemented web-based tool for local and national surveillance. J Antimicrob Chemother. 2018;73:3505–15. doi: 10.1093/jac/dky345. [DOI] [PubMed] [Google Scholar]
- 18.Austrian J S, Jamin C T, Doty G R, Blecker S. Impact of an emergency department electronic sepsis surveillance system on patient mortality and length of stay. J Am Med Informatics Assoc. 2018;25:523–9. doi: 10.1093/jamia/ocx072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Khan Y, Leung G J, Belanger P, Gournis E, Buckeridge D L, Liu L et al. Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study. Can J Public Health. 2018;109:419–26. doi: 10.17269/s41997-018-0059-0. [DOI] [PMC free article] [PubMed] [Google Scholar]