Table 3.
Architectural details of all models (n = 91) from all 82 articles included in the study
| Backbone model | Architecture | Number of models | Model size | Number of models | Variants as backbone | Number of models |
|---|---|---|---|---|---|---|
| Llama (7, 8, 14, 15, 62–91) | Decoder-only | 36 (39.6%) | 7B | 17 (47.2%) | Llama-base | 28 (77.8%) |
| 13B | 14 (38.9%) | Alpaca | 3 (8.3%) | |||
| 70B | 2 (5.6%) | Vicuna | 2 (5.6%) | |||
| 33B | 2 (5.6%) | AlpaCare | 1 (2.8%) | |||
| 65B | 1 (2.8%) | Orca | 1 (2.8%) | |||
| Ziya | 1 (2.8%) | |||||
| GPT (17, 73, 74, 84, 92–104) | Decoder-only | 16 (17.6%) | 1.5B | 8 (50.0%) | GPT-base | 14 (87.5%) |
| 175B | 2 (12.5%) | BioGPT | 2 (12.5%) | |||
| 6.7B | 2 (12.5%) | |||||
| 2.7B | 1 (6.3%) | |||||
| 20B | 1 (6.3%) | |||||
| 6B | 1 (6.3%) | |||||
| 1.3B | 1 (6.3%) | |||||
| ChatGLM (105–111) | Encoder–decoder | 7 (7.7%) | 6B | 7 (100.0%) | ChatGLM-base | 7 (100.0%) |
| T5 (88, 112–116) | Encoder–decoder | 6 (6.6%) | 11B | 4 (66.7%) | Flan-T5 | 3 (50.0%) |
| 3B | 2 (33.3%) | T5-base | 1 (16.7%) | |||
| mt5 | 1 (16.7%) | |||||
| ProtT5 | 1 (16.7%) | |||||
| Baichuan (75, 117–121) | Decoder-only | 6 (6.6%) | 7B | 4 (66.7%) | Baichuan-base | 6 (100.0%) |
| 13B | 2 (33.3%) | |||||
| From scratch (122–124) | Decoder-only | 3 (3.3%) | 6.4B | 1 (33.3%) | ProGen | 1 (33.3%) |
| 2.5B | 1 (33.3%) | ProGen2 | 1 (33.3%) | |||
| 1.2B | 1 (33.3%) | Nucleotide Transformer | 1 (33.3%) | |||
| BLOOM (125–127) | Decoder-only | 3 (3.3%) | 7B | 2 (66.7%) | BLOOM-base | 3 (100.0%) |
| 1B | 1 (33.3%) | |||||
| Qwen (9, 128) | Decoder-only | 2 (2.2%) | 14B | 1 (50.0%) | Qwen-base | 2 (100.0%) |
| 7B | 1 (50.0%) | |||||
| PaLM (11, 129) | Decoder-only | 2 (2.2%) | 540B | 1 (50.0%) | PaLM-base | 2 (100.0%) |
| 340B | 1 (50.0%) | |||||
| LongNet (130) | Decoder-only | 1 (1.1%) | 1B | 1 (100.0%) | LongNet-base | 1 (100.0%) |
| GenSLM (131) | Decoder-only | 1 (1.1%) | 25B | 1 (100.0%) | GenSLM-base | 1 (100.0%) |
| Henya (132) | Decoder-only | 1 (1.1%) | 7B | 1 (100.0%) | StripedHyena | 1 (100.0%) |
| Multimodal (14, 87, 91, 96, 130, 133, 134) | Mixed | 7 (7.7%) | NA | NA | ViT | 4 (57.1%) |
| BLIP | 1 (14.3%) | |||||
| CLIP | 1 (14.3%) | |||||
| LLaVA | 1 (14.3%) |
Abbreviation: NA, not applicable.