. 2024 Feb 2;37(2):471–488. doi: 10.1007/s10278-024-00985-3

Table 6.

Training settings of language models investigated in this study

Language models	Number of trainable parameters	Learning rate	Total batch size	Number of training epochs	Implementations and pretrained weighted
PGN	8.3 M	1e-3 *	25 *	30 *	https://github.com/yuhaozhang/summarize-radiology-findings
BERT2BERT	301.7 M	1e-4	32	15	https://huggingface.co/yikuan8/Clinical-Longformer
BART	406.3 M	5e-5	32	15	https://huggingface.co/facebook/bart-large
BioBART	406.3 M	5e-5	32	15	https://huggingface.co/GanjinZero/biobart-large
PEGASUS	568.7 M	2e-4	32	15	https://huggingface.co/google/pegasus-large
T5	783.2 M	4e-4	32	15	https://huggingface.co/google/t5-v1_1-large
Clinical-T5	737.7 M	4e-4	32	15	https://huggingface.co/luqh/ClinicalT5-large
FLAN-T5	783.2 M	4e-4	32	15	https://huggingface.co/google/flan-t5-large
GPT2	1.5 B	5e-5	32	15	https://huggingface.co/gpt2-xl
OPT	1.3 B	1e-4	32	15	https://huggingface.co/facebook/opt-1.3b
LLaMA-LoRA	4.2 M	2e-4	128	20	available upon request
Alpaca-LoRA	4.2 M	2e-4	128	20	https://huggingface.co/tatsu-lab/alpaca-7b-wdiff

Note that “*” denotes the hyperparameters directly taken from the original paper. Total batch size = training batch size per device × number of GPU devices × gradient accumulation steps