Table 2.
Data set | Task | Domain | Size | Example |
STS-Ba | Sentence pair similarity | General | 8600 | Sentence 1: “A young child is riding a horse”; Sentence 2: “A child is riding a horse”; Similarity: 4.75 |
RQEb | Sentence pair classification | Biomedical | 8900 | Sentence 1: “Doctor X thinks he is probably just a normal 18 month old but would like to know if there are a certain number of respiratory infections that are considered normal for that age”; Sentence 2: “Probably a normal 18 month old but how many respiratory infections are normal”; Ground truth: entailment |
MedNLIc | Sentence pair classification | Clinical | 14,000 | Sentence 1: “Labs were notable for Cr 1.7 (baseline 0.5 per old records) and lactate 2.4”; Sentence 2: “Patient has normal Cr”; Ground truth: contradiction |
QQPd | Sentence pair classification | General | 400,000 | Sentence 1: “Why do rockets look white?”; Sentence 2: “Why are rockets and boosters painted white?”; Ground truth: 1 |
Topic | Sentence classification | Clinical | 1,300,000 | Sentence: “Negative for difficulty urinating, pain with urination, and frequent urination”; Ground truth: SIGNORSYMPTOM |
MedNERe | Token-wise classification | Clinical | 15,000 | Sentence: “he developed respiratory distress on the AMf of admission, cough day PTAg, CXRh with B/Li LLj PNAk, started ciprofloxacin and levofloxacin”; Ground truth: ciprofloxacin [DRUG] levofloxacin [DRUG] |
aSTS-B: semantic textual similarity benchmark.
bRQE: Recognizing Question Entailment.
cMedNLI: natural language inference data set for the clinical domain.
dQQP: Quora Question Pairs.
eMedNER: medication named entity recognition.
fAM: morning.
gPTA: prior to admission.
hCXR: chest x-ray.
iB/L: bilateral.
jLL: left lower.
kPNA: pneumonia.