The data includes 200 images of natural scenes (plaque, packaging instructions, small advertisements, menus, posters, etc.), Internet images (magazine covers, comic covers, etc.), Document images (text documents, etc.).
Dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.