Figure 1.
Schematic of regLM. (A,B) DNA sequences are prefixed with a sequence of prompt tokens representing functional labels. (C) A HyenaDNA model is trained or fine-tuned to perform next token prediction on the labeled sequences. (D) The trained model is prompted with a sequence of prompt tokens to generate sequences with desired properties. (E,F) A sequence-to-function regression model trained on the same data set is used to check and filter the generated sequences. (G) The regulatory content of generated sequences is evaluated.