Skip to main content
. 2024 Feb 2;37(2):471–488. doi: 10.1007/s10278-024-00985-3

Table 6.

Training settings of language models investigated in this study

Language models Number of trainable parameters Learning rate Total batch size Number of training epochs Implementations and pretrained weighted
PGN 8.3 M 1e-3 * 25 * 30 * https://github.com/yuhaozhang/summarize-radiology-findings
BERT2BERT 301.7 M 1e-4 32 15 https://huggingface.co/yikuan8/Clinical-Longformer
BART 406.3 M 5e-5 32 15 https://huggingface.co/facebook/bart-large
BioBART 406.3 M 5e-5 32 15 https://huggingface.co/GanjinZero/biobart-large
PEGASUS 568.7 M 2e-4 32 15 https://huggingface.co/google/pegasus-large
T5 783.2 M 4e-4 32 15 https://huggingface.co/google/t5-v1_1-large
Clinical-T5 737.7 M 4e-4 32 15 https://huggingface.co/luqh/ClinicalT5-large
FLAN-T5 783.2 M 4e-4 32 15 https://huggingface.co/google/flan-t5-large
GPT2 1.5 B 5e-5 32 15 https://huggingface.co/gpt2-xl
OPT 1.3 B 1e-4 32 15 https://huggingface.co/facebook/opt-1.3b
LLaMA-LoRA 4.2 M 2e-4 128 20 available upon request
Alpaca-LoRA 4.2 M 2e-4 128 20 https://huggingface.co/tatsu-lab/alpaca-7b-wdiff

Note that “*” denotes the hyperparameters directly taken from the original paper. Total batch size = training batch size per device × number of GPU devices × gradient accumulation steps