Skip to main content
. 2024 Nov 22;14:29016. doi: 10.1038/s41598-024-80565-1

Figure 2.

Figure 2

Biological event extraction (BEE) pipeline: raw videos with T frames are cropped and normalized to Inline graphic, where Inline graphic is the number of frames that can span from 80 to 150 hpi. A sub-clip of 7 frames is constructed with the frame of interest as the central frame. These sub-clips are passed to a Transformer model to extract 512-d embeddings. The sequence of N embeddings is fed into a GRU network to output Inline graphic scores, where C is the number of classes. These scores are then converted to predictions using an HMM model.