Skip to main content
. 2023 Jul 3;13:10713. doi: 10.1038/s41598-023-36714-z

Figure 2.

Figure 2

M2BERT architecture. M2BERT has L layers which use multi-head attention and normalization. This architecture, similar to BERT’s architecture, is used for pretraining; later a multi-task approach is used to enrich the output representations, providing a set of informative, interpretable features for downstream tasks.