Skip to main content
. Author manuscript; available in PMC: 2021 Jun 22.
Published in final edited form as: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2020 Aug 5;2020:190–199. doi: 10.1109/cvpr42600.2020.00027

Table 3: Ablation study of the proposed state representation—dynamic contextual belief.

The full state (DCB-full) consists of 1 history map, 1 saliency map, 54 stuff maps, 79 context maps and 1 target map. We mask out one part by setting the map(s) to zeros at each time. See the supplementary for full results.

Sequence Score ↑ Scanpath Ratio ↑ Prob. ↓ Mismatch
DCB-full 0.422 0.803 1.029
w/o history map 0.419 0.800 1.042
w/o saliency map 0.419 0.795 1.029
w/o stuff maps 0.407 0.777 1.248
w/o thing maps 0.331 0.487 3.152
w/o target map 0.338 0.519 2.926
DCB 0.422 0.826 0.987
CFI 0.402 0.619 1.797