Table 3.
Summary of statistical and machine learning methods and data sources for event detection using Twitter data.
| Public Health Issue | Method | Complementary Data |
|---|---|---|
| Cancer | Support Vector Machine [61] | CDC |
| Smoking | Bayesian Logistic Regression [62] | |
| Suicide | ARIMA (Autoregressive Integrated Moving Average [26] | |
| Harmful Algal Blooms (HABS) | Deep Learning (CNN) [59] | |
| HIV | Decision Tree [63], Support Vector Machine [63], Graph Modelling [27], Multilayer Perceptron [63] | |
| Allergies | [64], Bayesian Inference [64] | pollen.com, National Climatic Data Center Climate Data Online (CDO) |
| Drug Abuse | Biterm Topic Model [55], Decision Tree [65], Support Vector Machine [58], Topic Model [66] | |
| HPV | Decision Tree [67], Linear Classifier [67] | |
| Infectious Intestinal Diseases (IID) | Word2Vec [68], Gaussian Process [68] | Public Health England |
| Adverse Drug Events (ADE) | Multi-Instance Logistic Regression [69] | |
| Depression | Non-Negative Matrix Factorization [70], ARIMA (Autoregressive Integrated Moving Average) [26], Simple Statistical Analysis [71], Stepwise Regression [60] | National Climatic Data Center, National Oceanic and Atmospheric Administration (NOAA) |
| Ebola | Lexicon Analysis [56], Support Vector Machine [56] | |
| Back Pain | Logistic Regression [72] | |
| Vomiting | TSVM [22], ARIMA (Autoregressive Integrated Moving Average) [22] | Public Health England |
| Gastroenteritis | TSVM [22], ARIMA (Autoregressive Integrated Moving Average) [22] | Public Health England |
| Asthma | Support Vector Machine [61] | CDC |
| Food Borne Illness | K-Nearest Neighbour [73], Support Vector Machine [32] | Southern Nevada Health District (SNHD), CDC |
| Earthquake | Clustering [19], Bayesian Inference [19] | |
| Diabetes | Support Vector Machine [61] | CDC |
| Dental Pain | Simple Statistical Analysis [74] | |
| Influenze-like Illnesses (ILIs) | Clustering [75], Lexicon Analysis [35,57,76], Deep Learning (RNN) [36], Logistic Regression [77], Gaussian Process [78], Deep Learning (CNN) [36], Outlier Detection [46], Bayesian Inference [35,57], Fasttext [36], ARIMA (Autoregressive Integrated Moving Average) [22], GloVe [36], FP-Growth [37], Trap Model [79], Support Vector Machine [37,77,80], Shallow MLP [81], TSVM [22], Word2Vec [75], Regression [80] | Penn State's Health Services, Infectious Disease Surveillance Center, Royal College of General Practitioners (RCGP), Public Health England, CDC |
| General Healtha | Support Vector Machine [82], Lexicon Analysis [82] | |
| Diarrhoea | TSVM [22], ARIMA (Autoregressive Integrated Moving Average) [22] | Public Health England |
| Obesity | Dbscan (Clustering) [54] | |
| Middle East Respiratory Syndrome (Mers) | Lexicon Analysis [56], Support Vector Machine [56] |
Generic feelings of unwellness and non-specific illness.