Table 4.
Overview of the experiment
Goal | Analyze and compare the image textual descriptions generated by the vision engines described in Sect. 3 to understand if there are differences in terms of their perceived correctness and analyze the differences with respect to human-authored descriptions. |
Research questions | RQ1 Is there any difference in the perceived correctness among the descriptions generated by the considered tools? |
RQ2 Is there any difference in the perceived correctness between the ground truth descriptions provided by humans and those provided by the tools? | |
Context | Objects: descriptions of 60 images selected from Wikipedia covering the three categories Human, Landmark, and General. |
Subjects: 76 students in computer science. | |
Null hypothesis | No effect on correctness (measured with a 5-point Likert scale). |
Treatments | Five: Wikipedia (manual), Azure Computer Vision Engine, Amazon Rekognition, Cloudsight, and Auto Alt-Text for Google Chrome. |
Dependent var | Perceived Correctness of the description with respect to the corresponding image. |