[30] |
EHRs |
-
-
BERT
-
-
Specific phenotypes associated with COVID-19 using the list of 60 regular expressions (NLP RegExp)
-
-
All signs, symptoms, and comorbidities were extracted with the quickUMLS algorithm [97] (NLP UMLS).
|
A multi-center study involving data from 39 hospitals |
[98] |
EHRs |
Keyword-extraction NLP that uses an unsupervised ML approach (clustering) |
450,114 patient CT comprehensive reports gathered from 1 January to October 2020 |
[99] |
EHRs |
Word frequency for text analytics and CNN trained using Word2vector as a classification model |
Data are collected through telehealth visits, including 6813 patients, of whom 498 tested positive and 6315 tested negative |
[32] |
EHRs |
NLP model (medical named entity recognition) |
Audio or video recordings of clinic visits |
[38] |
EHRs |
Multi-class logistic regression model trained n-gram features |
The study cohort includes 1737 COVID-19 adult patients discharged from two hospitals in Boston, Massachusetts, between 10 March and 30 June 2022 |
[39] |
EHRs |
NLP rule-based pipeline |
Data from VA Corporate Data Warehouse (CDW) include clinical data in 2020 between 1 January and 15 June |
[33] |
EHRs |
Random-forest trained on N-grams |
32,555 radiology reports from brain CTs and MRIs from a comprehensive stroke center |
[34] |
EHRs |
NLP rule-based pipeline |
6250 patients (5664 negative and 586 positives; 46,138 non-severe and 125 severe) |
[36] |
EHRs |
BERT and Bi-LSTM with attention |
Annotated 1472 clinical notes distinguishing COVID-19 diagnoses, testing, and symptoms |
[35] |
EHRs |
NLP rule-based pipeline |
NLP is validated on several datasets; the main one is related to COVID-19 and contains 50 posts (1162 sentences) of related dialogues |
[44] |
Mental health |
Supervised text classification used stochastic gradient descent linear classifier with L1 penalty TF-IDF grams with principal component analysis with k-NN used for unsupervised clustering. LDA is used in topic modeling. |
Social media: Reddit Mental Health Dataset including posts from 826,961 unique users |
[45] |
Mental health |
BERT (ft) |
Social media: 1000 English tweets for training the model and 1 million tweets included in the analysis |
[46] |
Mental health |
Sentiment analytic systems called CrystalFeel |
Social media: Over 20 million COVID-19 tweets between 28 January and April 2020 |
[47] |
Mental health |
Key phrase extraction and sentiment score using lexicon-based technique |
Social media: 47 million COVID-19- related comments extracted from Twitter, Facebook, and YouTube |
[100] |
Mental health |
Bi-directional LSTM and a self-attention layer |
Social media: The diagnosed group has approximately 900,000 tweets from several countries. The control group has approximately 14 million tweets from several countries |
[48] |
Mental health |
Sentence-BERT (SBERT) |
9090 English free-form texts from 1451 students between 1 February and 30 April 2020 |
[52] |
Health behaviors |
BERT |
1.1 million COVID-19-related tweets from 181 counties in the US |
[54] |
Health behaviors |
-
-
t-Distributed Stochastic Neighbor Embedding
-
-
DistilBART
-
-
VADER for sentiment analysis
-
-
Google’s Universal Sentence Encoder
|
189,958,459 English COVID-19-related tweets COVID-19 between 17 March to 27 July 2020 |
[55] |
Health behaviors |
SVM, XGBoost, and LSTM |
771,268 tweets from the US between January and October 2020 |
[56] |
Health behaviors |
LDA for topic modeling andaspect-based sentiment analysis |
English COVID-19 tweets are 25,595 for Canada and 293,929 for the US |
[57] |
Health behaviors |
BERT |
2,349,659 tweets related to COVID-19 vaccination 1 month after the first vaccine announcement |
[52] |
Health behaviors |
BERT |
1.1 million COVID-19-related tweets from 181 counties in the US |
[71] |
Misinformation detection |
Uses SAFE systems developed in [53] |
2029 news articles on COVID-19 (between January and May 2020) and 140,820 tweets that disclose how these news articles have circulated on Twitter |
[76] |
Misinformation detection |
NLP and network analysis method |
4573 annotated tweets comprising 3629 users |
[73] |
Misinformation detection |
SVM |
10,700 social media posts and articles of real and fake news on COVID-19 |
[101] |
Misinformation detection |
Sentence-BERT and BERTScore |
4800 expert-annotated social media posts |
[77] |
Misinformation detection |
BERT and ALBERT |
5500 claims and explanation pairs |
[90] |
COVID QA systems |
BERT and LDA |
COVID-19 scientific publications: CORD-19 dataset |
[83] |
COVID QA systems |
T5 |
COVID-19 scientific publications: CORD-19 dataset |
[102] |
COVID QA systems |
-
-
An ensemble of two QA models (HLTC-MRQA and BioBERT) for the QA model
-
-
BART [88] for abstractive summarization
-
-
ALBERT [89] in extractive summarization block
|
COVID-19 scientific publications: CORD-19 dataset |
[91] |
COVID QA systems |
BioBERT |
COVID-19 scientific publications: CORD-19 dataset, with additional 111 QA pairs annotated for test |
[92] |
COVID QA systems |
Synthetically generated QA examples to optimize the QA system performance on closed domains. The machine reading comprehension employs the Roberta model. |
COVID-19 scientific publications: CORD-19 dataset |
[95] |
Knowledge transfer |
XLM-R Large |
Dataset, M-CID, containing 5271 utterances across English, Spanish, French, and Spanglish |
[96] |
Knowledge transfer |
Multilingual Universal Sentence Encoder [103] |
4,683,226 geo-referenced tweets in 60 languages located in Europe |
[94] |
Knowledge transfer |
Variant transformers big architecture |
The model is trained on more than 350 million sentences in French, Spanish, German, Italian, and Korean (into English) |