Table 6.
Visualization of the generated captions of the ablated models, where the colored words are the improvements from the previous caption.
| Image | Captions |
|---|---|
|
Baseline: A couple of women standing next to each other. +self-att(Dec): A couple of women standing next to each other. +self-att(Enc+Dec): Two women are holding wine glasses in a room. Our PW: Two women standing next to each other holding wine glasses. Our CW: Two women drinking wine in a room. |
|
Baseline: A group of people walking down a street +self-att(Dec): A group of people standing in the street. +self-att(Enc+Dec): A group of people standing with an umbrella. Our PW: A group of people standing in the street with an umbrella. Our CW: A group of people standing under an umbrella. |
|
Baseline: A close up of a horse in a field. +self-att(Dec): A horse standing in a field. +self-att(Enc+Dec): A horse in the grass in a field. Our PW: A white horse standing in the grass in a field. Our CW: A white horse grazing in a field of grass. |
|
Baseline: A group of people on skis in the snow. +self-att(Dec): A man riding skis in the snow. +self-att(Enc+Dec): A group of people skiing down a snow covered slope. Our PW: A group of people riding skis down a snow covered slope. Our CW: Two men are skiing down a snow covered slope. |