Table 2.

Information extracted from the articles

1. What is the current landscape of biomedical LLMs?
Feature	Example of value	Feature	Example of value
Architecture	Decoder	Number of layers	32
Release date	2024.04	Hidden units	2,560
Backbone	Llama	Epochs	10
Modality	Text	Batch size	8
Number of parameters	10.7B	Sequence length	1,024
Tokenizer	BERT tokenizer	Learning rate	2e-6/5e-6
Number of attention heads	20
2. How are those biomedical LLMs being developed and evaluated?
Feature	Example of value	Feature	Example of value
Training strategy	From scratch	Training objective	MLM, SOP
Pretraining included?	No	Number of tokens	300B tokens
Instruction tuning included?	Yes	Train time	~6.25 days
Task-specific fine-tuning included?	Yes	GPUs used	128 A100–40GB
Corpus	MedQA, MedMCQA	Evaluation task	Text generation
Corpus type	EHR	Evaluation metric	Perplexity, BLEU, GLEU
3. What are the main applications of biomedical LLMs?
Feature	Example of value	Feature	Example of value
NLP task	Question answering	Institution	Google Research
Clinical application	Patient diagnosis	Language	English
Target user	Patient, caregiver	Data status	Proprietary
Carbon footprint	539 tCO₂eq	Model status	Open source
Journal	JAMIA, NeurIPS	License	MIT

Abbreviations: EHR, electronic health record; LLM, large language model; NLP, natural language processing; tCO₂eq, tons of carbon dioxide equivalent.