. Author manuscript; available in PMC: 2024 Aug 1.

Published in final edited form as: J Biomed Inform. 2023 Jul 23;144:104458. doi: 10.1016/j.jbi.2023.104458

Table 2:

A summary table showing primary few-shot approaches and evaluation methodologies.

Study	Task	Primary approach(es)	Evaluation metric(s)
Rios et al. [16]	Multi-label Text Classification	Neural architecture suitable for handling few- and zero-shot labels in the multi-label setting where the output label space satisfies two constraints: (1) the labels are connected forming a DAG; (2) each label has a brief natural language descriptor.	R@k (Recall@k), P@k (Precision@k), Macro-F₁ scores
Rios et al. [31]	Multi-label Text Classification	Semi-parametric neural matching network for diagnosis/procedure code prediction from EMR narratives.	Precision, Recall, F₁-scores, AUC (PR), AUC (ROC), P@k, R@k
Hofer et al. [10]	NER	Five improvements on NER tasks when only 10 annotated examples are available: 1. Layer-wise initialization with pre-trained weights (single pre-training); 2. Hyperparameter tuning; Combining pre-training data; Custom word embeddings; Optimizing out-of-vocabulary (OOV) words.	F₁-score
Pham et al. [39]	Neural Machine Translation (NMT)	A generic approach to address the challenge of rare word translation in NMT by using external phrase-based models to annotate the training data as experts. A pointer network is used to control the model-expert interaction. The trained model is able to copy the annotations into the output consistently.	BLEU score, SUGGESTION (SUG), SUGGESTION ACCURACY (SAC)
Yan et al. [42]	Text Classification	Short text classification framework based on Siamese CNNs and few-shot learning, to learn the discriminative text encoding for helping classifiers distinguish obscure or informal sentences. The different sentence structures and different descriptions of a topic are learned by few-shot learning strategy to improve the classifier’s generalization.	Accuracy
Manousogiannis et al. [47]	Concept Extraction	A simple few-shot learning approach, based on pre-trained word embeddings and data from the UMLS, combined with the provided training data.	Relaxed and strict Precision/Recall/F₁-scores
Gao et al. [49]	Relation Classification	Propose FewRel 2.0, a new task containing two real-world issues that FewRel ignores: few-shot domain adaptation, and few-shot none-of-the-above detection.	Accuracy
Lara-Clares et al. [51]	NER	Hybrid Bi-LSTM and CNN model to recognize multi-word entities. Learns high level features from datasets using a few-shot learning model. Wikipedia2vec is used for automatic extraction and classification of keywords.	F₁-score
Ferré et al. [53]	Entity Normalization	A new neural approach (C-Norm) which synergistically combines standard and weak supervision, ontological knowledge integration and distributional semantics.	The offcial evaluation tool of the BB-norm task: a similarity score and a strict exact match score.
Hou et al. [55]	Slot Tagging (NER)	Introduction of a collapsed dependency transfer mechanism into CRF to transfer abstract label dependency patterns in the form of transition scores. The emission score of CRF is computed as the word similarity with respect to each label representation. A Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on TapNet is used to compute the similarity by leveraging label name semantics in representing labels.	F₁-score
Sharaf et al. [57]	Neural Machine Translation (NMT)	Framing the adaptation of NMT systems as a meta-learning problem. The model can learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks.	BLEU, SacreBLEU (to measure case-sensitive de-tokenized BLEU)
Lu et al. [59]	Multi-label Text Classification	A simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships to incorporate aggregated knowledge in multi-label zero/few-shot document classification. Three kinds of semantic information are used: pre-trained word embeddings; label description; pre-defined label relations.	Recall@K, nDCG@K
Jia et al. [61]	NER	Creation of distinct feature distributions for each entity type across domains, which improves transfer learning power, as compared to representation networks that do not explicitly differentiate between entity types.	F₁-score
Chalkidis et al. [66]	Multi-label Text Classification	Hierarchical methods based on Probabilistic Label Trees (PLTs); Combines BERT with LWAN; Use of structural information from thelabel hierarchy in LWAN. Leverages label hierarchy to improve few and zero-shot learning.	R-Precision@K (a top-K version of R-Precision of each document), nDCG@K
Lwowski et al. [68]	Text Classification	A self-supervised learning algorithm to monitor COVID-19 Twitter using an autoencoder to learn the latent representations. Knowledge transfer to COVID-19 infection classifier by fine-tuning the Multi-Layer Perceptron (MLP) using fewshot learning.	Accuracy, Precision, Recall, F₁-score
Hou et al. [9]	Dialogue Language Understanding with two sub-tasks: Intent Detection (classification) and Slot Tagging (sequence labeling)	A novel few-shot learning benchmark for NLP (FewJoint). Introduces few-shot joint dialogue language understanding, which additionally covers the problems of structure prediction and multi-task reliance.	Intent Accuracy, Slot F₁-score, Sentence Accuracy
Chen et al. [70]	Natural Language Generation (NLG)	The design of the model architecture is based on two aspects: content selection from input data and language modeling to compose coherent sentences, which can be acquired from prior knowledge.	BLEU-4, ROUGE-4 (F-measure)
Vaci et al. [72]	Concept Extraction	Used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, built and finetuned the statistical models using active learning procedures.	Precision, Recall, F₁-score
Huang et al. [73]	NER	The first systematic study for few-shot NER. Three distinctive schemes (and their combinations) are investigated: (1) meta-learning to construct entity prototypes; (2) supervised pre-training to obtain generic entity representations; (3) self-supervised training to utilize unlabeled in-domain data.	F₁-score
Chen et al. [74]	Classification	A classification and diagnosis method for Alzheimer’s patients based on multi-modal feature fusion and small sample learning. The compressed interactive network is then used to explicitly fuse the extracted features at the vector level. Finally, the KNN attention pooling layer and the convolutional network are used to construct a small sample learning network to classify the patient diagnosis data.	Accuracy, F₁-score
Yin et al. [75]	Sequence Tagging (Event trigger identification)	Combination of a prototypical network and a relation network module to model the task of biomedical event trigger identification. In addition, to make full use of the external knowledge base to learn the complex biological context, a self-attention mechanism is introduced.	F₁-score
Goodwin et al. [77]	Abstractive Summarization	Highly-abstractive multi-document summarization conditioned on user-defined query using BART, T5, and PEGASUS.	ROUGE-1, ROUGE-2, ROUGE-L F₁-scores, BLEU-4, Repetition Rate
Yang et al. [78]	NER	Uses an NER model trained under supervision on source domain for feature extraction. Structured decoding is used with nearest neighbor learning instead of expensive CRF training.	F₁-score
Hartmann et al. [82]	Concept Extraction	A universal approach to multilingual negation scope resolution: zero-shot cross-lingual transfer for negation scope resolution in clinical text. Exploits data from disparate sources by data concatenation, or in an MTL setup.	Percentage of correct spans (PCS), F₁-score over scope tokens
Fivez et al. [86]	Name Normalization	Propose truly robust representations, which capture more domain-specific semantics while remaining universally applicable across different biomedical corpora and domains. Use conceptual grounding constraints which more effectively align encoded names to pretrained embeddings of their concept identifiers.	For synonym retrieval: Mean average precision (mAP) over all synonyms. For concept mapping: Accuracy (Acc) and Mean reciprocal rank (MRR) of the highest ranked correct synonym.
Lu et al. [87]	Rumor Detection¹	A few-shot learning-based multi-modality fusion model named for COVID-19 rumor detection. Includes text embedding modules with pre-trained BERT model, a feature extraction module with multilayer Bi-GRUs, and a multi-modality feature fusion module with a fusion layer. Uses a metalearning based few-shot learning paradigm.	Accuracy
Ma et al. [89]	Drug-response Predictions	Applied the few-shot learning paradigm to three context-transfer challenges: transfer of a predictive model learned in one tissue type to the distinct contexts of other tissues; transfer of a predictive model learned in tumor cell lines to patient-derived tumor cell (PDTC) cultures in vitro; transfer of a predictive model learned in tumor cell lines to the context of patient-derived tumor xenografts (PDXs) in mice in vivo.	Accuracy, Pearson’s correlation, AUC
Kormilitzin et al. [90]	NER	Self-supervised training of deep neural network language model using the cloze-style approach. Synthetic training data with noisy labels is created using weak supervision. All constituent components are combined into an active learning approach.	Accuracy, Precision, Recall, F₁-score
Guo et al. [91]	Extract Entity Relations	A Siamese graph neural network (BioGraphSAGE) with structured databases as domain knowledge to extract biological entity relations from literature.	Precision (P-value), Recall (R-value), F₁-score
Lee et al. [92]	Fact-Checking (Text Classification)	Propose evidence-conditioned perplexity, a novel way of leveraging the perplexity score from LMs for the few-shot fact-checking task.	Accuracy, Macro-F₁-score
Fivez et al. [96]	Name Normalization	A scalable few-shot learning approach for robust biomedical name representations. Training a simple encoder architecture in a few-shot setting using small subsamples of general higher-level concepts which span a large range of fine-grained concepts.	Spearman’s rank correlation coefficient
Xiao et al. [97]	Relation Classification	Adaptive prototypical networks with label words and joint representation learning based on metric learning for FSRC, which performs classification by calculating the distances in the learned metric space.	Accuracy
Ziletti et al. [98]	Medical Coding (classification)	Combines traditional BERT-based classification with task-aware representation of sentences, a zero/few-shot learning approach that leverages label semantics.	Accuracy
Ye et al. [99]	Cross-task Generalization	Present CROSSFIT, a few-shot learning challenge to acquire, evaluate and analyze cross-task generalization in a realistic setting. Additionally, introduce the NLP Few-shot Gym, a repository of 160 few-shot NLP tasks gathered from open-access resources.	Average Relative Gain (ARG)
Aly et al. [100]	NER and classification (NERC)	Present the first approach for zero-shot NERC by using transformers with cross-attention to leverage naturally occurring entity type descriptions. The negative class is modeled by: (1) description-based encoding, and (2) independent (direct) encoding (3) class-aware encoding.	F₁-score
Wright et al. [101]	Exaggeration Detection¹ (Information Extraction)	Propose multi-task Pattern Exploiting Training (MT-PET) to leverage knowledge from auxiliary cloze-style QA tasks for few-shot learning. Present a set of labeled press release/abstract pairs from existing expert-annotated studies on exaggeration in the press releases of scientific papers suitable for benchmarking the performance of machine learning models.	Precision, Recall, F₁-score
Lee et al. [102]	NER	Present a simple demonstration-based learning method for NER, which lets the input be prefaced by task demonstrations for in-context learning, and perform a systematic study on demonstration strategy regarding what to include, how to select the examples, and what templates to use.	F₁-score
Wang et al. [103]	Classification	Propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem.	Comprehensiveness, Sufficiency (for measuring to what extent the model adheres to human rationales.)
Yan et al. [104]	NER	Proposes a text mining pipeline for enabling the FAIR neuroimaging study. In order to avoid fragmented information, the Brain Informatics provenance model is redesigned based on NIDM (Neuroimaging Data Model) and FAIR facets.	Precision, Recall, F₁-score
Lin et al. [105]	Information Extraction	Proposes a literature mining-based approach for research sharing-oriented neuroimaging provenance construction. A joint extraction model based on deep adversarial learning, called AT-NeuroEAE, is proposed to realize the event extraction in a few-shot learning scenario.	Precision, Recall, F₁-score
Riveland et al. [106]	Classfication	Present neural models of one of humans’ most astonishing cognitive feats: the ability to interpret linguistic instructions in order to perform novel tasks with just a few practice trials. Models are trained on a set of commonly studied psychophysical tasks, and receive linguistic instructions embedded by transformer architectures pretrained on natural language processing.	Accuracy
Navarro et al. [107]	Abstractive summarization²	Fine-tuned several state-of-the-art (SOTA) models in a newly created medical dialogue dataset of 143 snippets, based on 27 general practice conversations paired with their respective summaries.	ROUGE scores
Das et al. [108]	NER	Present CONTAINER, a novel contrastive learning technique that optimizes the intertoken distribution distance for Few-Shot NER. Instead of optimizing class-specific attributes, CONTAINER optimizes a generalized objective of differentiating between token categories based on their Gaussian-distributed embeddings.	F₁-score
Ma et al. [109]	NER	Leveraging the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. Propose a neural architecture consisting of two BERT encoders, one for document encoding and another for label encoding.	F₁-score
Parmar et al. [110]	Multi-Task Learning²	Explores the impact of instructional prompts for biomedical MTL. Introduce BoX, a collection of 32 instruction tasks for Biomedical NLP across various categories. Propose a unified model (In-BoXBART) using this meta-dataset, that can jointly learn all BoX tasks without any task-specific modules.	ROUGE-L, F₁-score
Boulanger et al. [111]	NER	Use the generative capacity of LLMs to create unlabelled synthetic data. Semi-supervised learning is used for NER in a low resource setup.	F₁-score
Yeh et al. [112]	Relation Extraction	Present a simple yet effective method to systematically generate comprehensive prompts that reformulate the relation extraction task as a cloze-test task under a simple prompt formulation. In particular, experiment with different ranking scores for prompt selection.	F₁-score
Pan et al. [113]	Question Answering	Supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks. Zero-shot performance on domain-specific reading comprehension tasks is evaluated by combining task transfer with domain adaptation to fine-tune a pre-trained model with no labelled data from the target task.	F₁-score
Wadden et al. [114]	Scientific Claim Verification	Present MULTIVERS, which predicts a fact-checking label and identifies rationales in a multitask fashion based on a shared encoding of the claim and full document context using weakly-supervised domain adaptation.	Precision, Recall, F₁-score
Li et al. [115]	Relation Classification	Learn a prototype encoder from relation definition text in a way that is useful for relation instance classification. Use a joint training approach to train both a prototype encoder from definition and an instance encoder.	Accuracy
Zhang et al. [116]	Natural Language Inference (NLI)	An instance discrimination based approach to bridge semantic entailment and contradiction understanding with high-level categorical concept encoding (PairSupCon).	Clustering Accuracy

Denotes papers where a new non-biomedical FSL dataset is introduced.

Denotes papers where a new FSL dataset specific to the biomedical domain is introduced.