Table 2:
A summary table showing primary few-shot approaches and evaluation methodologies.
| Study | Task | Primary approach(es) | Evaluation metric(s) |
|---|---|---|---|
| Rios et al. [16] | Multi-label Text Classification | Neural architecture suitable for handling few- and zero-shot labels in the multi-label setting where the output label space satisfies two constraints: (1) the labels are connected forming a DAG; (2) each label has a brief natural language descriptor. | R@k (Recall@k), P@k (Precision@k), Macro-F1 scores |
| Rios et al. [31] | Multi-label Text Classification | Semi-parametric neural matching network for diagnosis/procedure code prediction from EMR narratives. | Precision, Recall, F1-scores, AUC (PR), AUC (ROC), P@k, R@k |
| Hofer et al. [10] | NER | Five improvements on NER tasks when only 10 annotated examples are available: 1. Layer-wise initialization with pre-trained weights (single pre-training); 2. Hyperparameter tuning; Combining pre-training data; Custom word embeddings; Optimizing out-of-vocabulary (OOV) words. | F1-score |
| Pham et al. [39] | Neural Machine Translation (NMT) | A generic approach to address the challenge of rare word translation in NMT by using external phrase-based models to annotate the training data as experts. A pointer network is used to control the model-expert interaction. The trained model is able to copy the annotations into the output consistently. | BLEU score, SUGGESTION (SUG), SUGGESTION ACCURACY (SAC) |
| Yan et al. [42] | Text Classification | Short text classification framework based on Siamese CNNs and few-shot learning, to learn the discriminative text encoding for helping classifiers distinguish obscure or informal sentences. The different sentence structures and different descriptions of a topic are learned by few-shot learning strategy to improve the classifier’s generalization. | Accuracy |
| Manousogiannis et al. [47] | Concept Extraction | A simple few-shot learning approach, based on pre-trained word embeddings and data from the UMLS, combined with the provided training data. | Relaxed and strict Precision/Recall/F1-scores |
| Gao et al. [49] | Relation Classification | Propose FewRel 2.0, a new task containing two real-world issues that FewRel ignores: few-shot domain adaptation, and few-shot none-of-the-above detection. | Accuracy |
| Lara-Clares et al. [51] | NER | Hybrid Bi-LSTM and CNN model to recognize multi-word entities. Learns high level features from datasets using a few-shot learning model. Wikipedia2vec is used for automatic extraction and classification of keywords. | F1-score |
| Ferré et al. [53] | Entity Normalization | A new neural approach (C-Norm) which synergistically combines standard and weak supervision, ontological knowledge integration and distributional semantics. | The offcial evaluation tool of the BB-norm task: a similarity score and a strict exact match score. |
| Hou et al. [55] | Slot Tagging (NER) | Introduction of a collapsed dependency transfer mechanism into CRF to transfer abstract label dependency patterns in the form of transition scores. The emission score of CRF is computed as the word similarity with respect to each label representation. A Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on TapNet is used to compute the similarity by leveraging label name semantics in representing labels. | F1-score |
| Sharaf et al. [57] | Neural Machine Translation (NMT) | Framing the adaptation of NMT systems as a meta-learning problem. The model can learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. | BLEU, SacreBLEU (to measure case-sensitive de-tokenized BLEU) |
| Lu et al. [59] | Multi-label Text Classification | A simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships to incorporate aggregated knowledge in multi-label zero/few-shot document classification. Three kinds of semantic information are used: pre-trained word embeddings; label description; pre-defined label relations. | Recall@K, nDCG@K |
| Jia et al. [61] | NER | Creation of distinct feature distributions for each entity type across domains, which improves transfer learning power, as compared to representation networks that do not explicitly differentiate between entity types. | F1-score |
| Chalkidis et al. [66] | Multi-label Text Classification | Hierarchical methods based on Probabilistic Label Trees (PLTs); Combines BERT with LWAN; Use of structural information from thelabel hierarchy in LWAN. Leverages label hierarchy to improve few and zero-shot learning. | R-Precision@K (a top-K version of R-Precision of each document), nDCG@K |
| Lwowski et al. [68] | Text Classification | A self-supervised learning algorithm to monitor COVID-19 Twitter using an autoencoder to learn the latent representations. Knowledge transfer to COVID-19 infection classifier by fine-tuning the Multi-Layer Perceptron (MLP) using fewshot learning. | Accuracy, Precision, Recall, F1-score |
| Hou et al. [9] | Dialogue Language Understanding with two sub-tasks: Intent Detection (classification) and Slot Tagging (sequence labeling) | A novel few-shot learning benchmark for NLP (FewJoint). Introduces few-shot joint dialogue language understanding, which additionally covers the problems of structure prediction and multi-task reliance. | Intent Accuracy, Slot F1-score, Sentence Accuracy |
| Chen et al. [70] | Natural Language Generation (NLG) | The design of the model architecture is based on two aspects: content selection from input data and language modeling to compose coherent sentences, which can be acquired from prior knowledge. | BLEU-4, ROUGE-4 (F-measure) |
| Vaci et al. [72] | Concept Extraction | Used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, built and finetuned the statistical models using active learning procedures. | Precision, Recall, F1-score |
| Huang et al. [73] | NER | The first systematic study for few-shot NER. Three distinctive schemes (and their combinations) are investigated: (1) meta-learning to construct entity prototypes; (2) supervised pre-training to obtain generic entity representations; (3) self-supervised training to utilize unlabeled in-domain data. | F1-score |
| Chen et al. [74] | Classification | A classification and diagnosis method for Alzheimer’s patients based on multi-modal feature fusion and small sample learning. The compressed interactive network is then used to explicitly fuse the extracted features at the vector level. Finally, the KNN attention pooling layer and the convolutional network are used to construct a small sample learning network to classify the patient diagnosis data. | Accuracy, F1-score |
| Yin et al. [75] | Sequence Tagging (Event trigger identification) | Combination of a prototypical network and a relation network module to model the task of biomedical event trigger identification. In addition, to make full use of the external knowledge base to learn the complex biological context, a self-attention mechanism is introduced. | F1-score |
| Goodwin et al. [77] | Abstractive Summarization | Highly-abstractive multi-document summarization conditioned on user-defined query using BART, T5, and PEGASUS. | ROUGE-1, ROUGE-2, ROUGE-L F1-scores, BLEU-4, Repetition Rate |
| Yang et al. [78] | NER | Uses an NER model trained under supervision on source domain for feature extraction. Structured decoding is used with nearest neighbor learning instead of expensive CRF training. | F1-score |
| Hartmann et al. [82] | Concept Extraction | A universal approach to multilingual negation scope resolution: zero-shot cross-lingual transfer for negation scope resolution in clinical text. Exploits data from disparate sources by data concatenation, or in an MTL setup. | Percentage of correct spans (PCS), F1-score over scope tokens |
| Fivez et al. [86] | Name Normalization | Propose truly robust representations, which capture more domain-specific semantics while remaining universally applicable across different biomedical corpora and domains. Use conceptual grounding constraints which more effectively align encoded names to pretrained embeddings of their concept identifiers. | For synonym retrieval: Mean average precision (mAP) over all synonyms. For concept mapping: Accuracy (Acc) and Mean reciprocal rank (MRR) of the highest ranked correct synonym. |
| Lu et al. [87] | Rumor Detection1 | A few-shot learning-based multi-modality fusion model named for COVID-19 rumor detection. Includes text embedding modules with pre-trained BERT model, a feature extraction module with multilayer Bi-GRUs, and a multi-modality feature fusion module with a fusion layer. Uses a metalearning based few-shot learning paradigm. | Accuracy |
| Ma et al. [89] | Drug-response Predictions | Applied the few-shot learning paradigm to three context-transfer challenges: transfer of a predictive model learned in one tissue type to the distinct contexts of other tissues; transfer of a predictive model learned in tumor cell lines to patient-derived tumor cell (PDTC) cultures in vitro; transfer of a predictive model learned in tumor cell lines to the context of patient-derived tumor xenografts (PDXs) in mice in vivo. | Accuracy, Pearson’s correlation, AUC |
| Kormilitzin et al. [90] | NER | Self-supervised training of deep neural network language model using the cloze-style approach. Synthetic training data with noisy labels is created using weak supervision. All constituent components are combined into an active learning approach. | Accuracy, Precision, Recall, F1-score |
| Guo et al. [91] | Extract Entity Relations | A Siamese graph neural network (BioGraphSAGE) with structured databases as domain knowledge to extract biological entity relations from literature. | Precision (P-value), Recall (R-value), F1-score |
| Lee et al. [92] | Fact-Checking (Text Classification) | Propose evidence-conditioned perplexity, a novel way of leveraging the perplexity score from LMs for the few-shot fact-checking task. | Accuracy, Macro-F1-score |
| Fivez et al. [96] | Name Normalization | A scalable few-shot learning approach for robust biomedical name representations. Training a simple encoder architecture in a few-shot setting using small subsamples of general higher-level concepts which span a large range of fine-grained concepts. | Spearman’s rank correlation coefficient |
| Xiao et al. [97] | Relation Classification | Adaptive prototypical networks with label words and joint representation learning based on metric learning for FSRC, which performs classification by calculating the distances in the learned metric space. | Accuracy |
| Ziletti et al. [98] | Medical Coding (classification) | Combines traditional BERT-based classification with task-aware representation of sentences, a zero/few-shot learning approach that leverages label semantics. | Accuracy |
| Ye et al. [99] | Cross-task Generalization | Present CROSSFIT, a few-shot learning challenge to acquire, evaluate and analyze cross-task generalization in a realistic setting. Additionally, introduce the NLP Few-shot Gym, a repository of 160 few-shot NLP tasks gathered from open-access resources. | Average Relative Gain (ARG) |
| Aly et al. [100] | NER and classification (NERC) | Present the first approach for zero-shot NERC by using transformers with cross-attention to leverage naturally occurring entity type descriptions. The negative class is modeled by: (1) description-based encoding, and (2) independent (direct) encoding (3) class-aware encoding. | F1-score |
| Wright et al. [101] | Exaggeration Detection1 (Information Extraction) | Propose multi-task Pattern Exploiting Training (MT-PET) to leverage knowledge from auxiliary cloze-style QA tasks for few-shot learning. Present a set of labeled press release/abstract pairs from existing expert-annotated studies on exaggeration in the press releases of scientific papers suitable for benchmarking the performance of machine learning models. | Precision, Recall, F1-score |
| Lee et al. [102] | NER | Present a simple demonstration-based learning method for NER, which lets the input be prefaced by task demonstrations for in-context learning, and perform a systematic study on demonstration strategy regarding what to include, how to select the examples, and what templates to use. | F1-score |
| Wang et al. [103] | Classification | Propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem. | Comprehensiveness, Sufficiency (for measuring to what extent the model adheres to human rationales.) |
| Yan et al. [104] | NER | Proposes a text mining pipeline for enabling the FAIR neuroimaging study. In order to avoid fragmented information, the Brain Informatics provenance model is redesigned based on NIDM (Neuroimaging Data Model) and FAIR facets. | Precision, Recall, F1-score |
| Lin et al. [105] | Information Extraction | Proposes a literature mining-based approach for research sharing-oriented neuroimaging provenance construction. A joint extraction model based on deep adversarial learning, called AT-NeuroEAE, is proposed to realize the event extraction in a few-shot learning scenario. | Precision, Recall, F1-score |
| Riveland et al. [106] | Classfication | Present neural models of one of humans’ most astonishing cognitive feats: the ability to interpret linguistic instructions in order to perform novel tasks with just a few practice trials. Models are trained on a set of commonly studied psychophysical tasks, and receive linguistic instructions embedded by transformer architectures pretrained on natural language processing. | Accuracy |
| Navarro et al. [107] | Abstractive summarization2 | Fine-tuned several state-of-the-art (SOTA) models in a newly created medical dialogue dataset of 143 snippets, based on 27 general practice conversations paired with their respective summaries. | ROUGE scores |
| Das et al. [108] | NER | Present CONTAINER, a novel contrastive learning technique that optimizes the intertoken distribution distance for Few-Shot NER. Instead of optimizing class-specific attributes, CONTAINER optimizes a generalized objective of differentiating between token categories based on their Gaussian-distributed embeddings. | F1-score |
| Ma et al. [109] | NER | Leveraging the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. Propose a neural architecture consisting of two BERT encoders, one for document encoding and another for label encoding. | F1-score |
| Parmar et al. [110] | Multi-Task Learning2 | Explores the impact of instructional prompts for biomedical MTL. Introduce BoX, a collection of 32 instruction tasks for Biomedical NLP across various categories. Propose a unified model (In-BoXBART) using this meta-dataset, that can jointly learn all BoX tasks without any task-specific modules. | ROUGE-L, F1-score |
| Boulanger et al. [111] | NER | Use the generative capacity of LLMs to create unlabelled synthetic data. Semi-supervised learning is used for NER in a low resource setup. | F1-score |
| Yeh et al. [112] | Relation Extraction | Present a simple yet effective method to systematically generate comprehensive prompts that reformulate the relation extraction task as a cloze-test task under a simple prompt formulation. In particular, experiment with different ranking scores for prompt selection. | F1-score |
| Pan et al. [113] | Question Answering | Supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks. Zero-shot performance on domain-specific reading comprehension tasks is evaluated by combining task transfer with domain adaptation to fine-tune a pre-trained model with no labelled data from the target task. | F1-score |
| Wadden et al. [114] | Scientific Claim Verification | Present MULTIVERS, which predicts a fact-checking label and identifies rationales in a multitask fashion based on a shared encoding of the claim and full document context using weakly-supervised domain adaptation. | Precision, Recall, F1-score |
| Li et al. [115] | Relation Classification | Learn a prototype encoder from relation definition text in a way that is useful for relation instance classification. Use a joint training approach to train both a prototype encoder from definition and an instance encoder. | Accuracy |
| Zhang et al. [116] | Natural Language Inference (NLI) | An instance discrimination based approach to bridge semantic entailment and contradiction understanding with high-level categorical concept encoding (PairSupCon). | Clustering Accuracy |
Denotes papers where a new non-biomedical FSL dataset is introduced.
Denotes papers where a new FSL dataset specific to the biomedical domain is introduced.