Table 2.
Overview of pre-trained language models in biomedicine with release dates
Model Name | Corpora | LLM Backbone | Release Date |
---|---|---|---|
BioBert | PubMed abstracts, PMC articles | BERT | 2020 |
MedBert | Medical texts, EHRs | BERT | 2021 |
ClinicalBERT | MIMIC-III clinical notes | BERT | 2019 |
SciBERT | Scientific papers (82% biomedical) | BERT | 2019 |
COVID-twitter-BERT | Tweets about COVID-19 | BERT | 2023 |
MedGPT | Electronic health records (EHRs) | GPT | 2021 |
SCIFIVE | Biomedical corpora | T5 | 2021 |
LLMBiomedicine | Biomedical texts (NER [214] tasks) | GPT-4 | 2024 |
ClinicalGPT | Diverse medical data | GPT | 2023 |
MultiMedQA | Medical QA datasets | PaLM [46] | 2023 |
Chatdoctor | Patient-physician conversations | LLaMa | 2023 |
Taiyi | Biomedical texts, multilingual | Qwen [215] | 2024 |