Skip to main content
[Preprint]. 2023 Oct 17:arXiv:2306.10070v2. [Version 2]

Figure 1.

Figure 1.

The paradigm of LLMs. Pre-training: LLMs are trained on large scale corpus using autoregressive language model; Instruction Fine-tuning: pre-trained LLMs are fine-tuned on a dataset of human-written demonstrations of the desired output behavior on prompts using supervised learning; RLHF Fine-tuning: a reward model is trained using collected comparison data, then the supervised model is further fine-tuned against the reward model using reinforcement learning algorithm.