Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Aug 7;10(8):191. doi: 10.3390/jimaging10080191

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2024 by the authors.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Top-level ESPFNet architecture. The input is an $H \times H$ video frame $\hat{I}$ , where $H = 352$ for our application, while the output is a $720 \times 720$ segmented frame $M$ . The MiT encoder of Xie et al. serves as the network backbone [24], while the the Efficient Stage-Wise Feature Pyramid (ESFP) serves as the decoder. The four layers constituting the ESFP are (1) the basic prediction (BP) layer, given by blocks BP #1 through BP #4; (2) the aggregating fusion (AF) layer, defined by AF #1 through AF #3; (3) the aggregating prediction (AP) layer, given by AP #1 through AP #3; and (4) the multi-stage fusion (MF) layer. The “Final Segment” block produces the final segmented video frame. Quantities such as $F_{1}, F_{2}, \dots, F^{MF}$ denote the feature tensors produced by each network block, while quantities “ $A \times A \times C_{i}$ ” specify the feature tensor dimensions, e.g., the dimensions of $F_{1}$ are $\frac{H}{4} \times \frac{H}{4} \times C_{1}$ .