Skip to main content
. 2021 Nov 30;21(23):7982. doi: 10.3390/s21237982

Table 4.

Samples of image captions generated by our PW, CW, and baseline as well as ground truths.

Image Captions
graphic file with name sensors-21-07982-i001.jpg Baseline: A couple of women standing next to each other.
Our PW: Two women standing next to each other holding wine glasses.
Our CW: Two women drinking wine in a room.
GT1: Two young women are sharing a bottle of wine.
GT2: Two female friends posing with a bottle of wine.
GT3: Two women posing for a photo with drinks in hand.
graphic file with name sensors-21-07982-i002.jpg Baseline: A group of people walking down a street.
Our PW: A group of people standing in the street with an umbrella.
Our CW: A group of people standing under an umbrella.
GT1: Several people standing on a sidewalk under an umbrella.
GT2: Some people standing on a dark street with an umbrella.
GT3: Some people standing on a dark street with an umbrella.
graphic file with name sensors-21-07982-i003.jpg Baseline: A close up of a horse in a field.
Our PW: A white horse standing in the grass in a field.
Our CW: A white horse grazing in a field of grass.
GT1: A horse eating grass in a green field.
GT2: A while horse bending down eating grass.
GT3: A tall black and white horse standing on a lush green field.
graphic file with name sensors-21-07982-i004.jpg Baseline: A group of people on skis in the snow.
Our PW: A group of people riding skis down a snow covered slope.
Our CW: Two men are skiing down a snow covered slope.
GT1: Two cross country skiers heading onto the trail.
GT2: Two guys cross country ski in a race.
GT3: Skiers on their skis ride on the slope while others watch.