Table 4:
The 3-step NLP/ML application and corresponding techniques among the 79 studies included in the systematic review*
| Steps and techniques | N | % |
|---|---|---|
| Step 1: Preprocessing | 60 | 75.9 |
| Annotation | 38 | 48.1 |
| Text tokenization | 36 | 45.6 |
| Remove stop-words | 18 | 22.8 |
| Part-of-speech (POS) tagging | 16 | 20.3 |
| Normalization | 14 | 17.7 |
| Lemmatization/ stemming | 12 | 15.2 |
| Step 2: Feature extraction and representations | 69 | 87.3 |
| Rule-based NLP | 37 | 46.8 |
| Affirmation/ negation | 33 | 41.8 |
| Word2vec/ bag-of-words (BOW) | 23 | 29.1 |
| Name entity recognition (NER) | 16 | 20.3 |
| N-gram (Term Frequency–Inverse Document Frequency [TF-IDF], Document-Term Matrix [DTM], Term-Document Matrix [TDM]) | 15 | 19.0 |
| Latent Dirichlet Allocation (LDA) for topic modeling | 5 | 6.3 |
| Latent semantic indexing (LSI) | 1 | 1.3 |
| Knowledge graph | 1 | 1.3 |
| Step 3: Data analysis (non-neural ML) | 39 | 49.4 |
| Support vector machine (SVM) | 18 | 22.8 |
| Decision tree (DT) | 6 | 7.6 |
| Conditional random fields (CRF) | 9 | 11.4 |
| Logistic regression classifier | 8 | 10.1 |
| Naïve Bayesian | 6 | 7.6 |
| Random forest (RF) | 6 | 7.6 |
| K-means clustering | 3 | 3.8 |
| K-nearest neighborhood (KNN) | 3 | 3.8 |
| Boosting (e.g., Light Gradient Boosting Machine [LightGBM], eXtreme Gradient Boosting [XGBoost]) | 2 | 2.5 |
| Linear regression classifier | 2 | 2.5 |
| Bagging | 1 | 1.3 |
| Step 3: Data analysis (neural ML) | 22 | 27.8 |
| Convolutional neural network (CNN) | 10 | 12.7 |
| Recurrent Neural Network (RNN) (e.g., Bi-LSTM, GRU, Glove) | 10 | 12.7 |
| Artificial neural network (ANN) Feed forward network (FFN) | 7 | 8.9 |
| Transformer (e.g, Bidirectional Encoder Representations from Transformers [BERT], Bio-BERT) | 3 | 3.8 |
| Auto encoder | 3 | 3.8 |
| Embeddings from Language Model (ELMo) | 1 | 1.3 |
| Others | 2 | 2.5 |
Abbreviations: Bi-LSTM, Bi-Long Short-Term Memory; BERT, Bidirectional Encoder Representations from Transformers.
See Supplementary Table S5 for a list of references