To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

What goes into a word: ge… - University of Gothenburg, Sweden Till startsida
Sitemap
To content Read more about how we use cookies on gu.se

What goes into a word: generating image descriptions with top-down spatial knowledge

Conference paper
Authors Mehdi Ghanimifard
Simon Dobnik
Published in Proceedings of the 12th International Conference on Natural Language Generation (INLG-2019)
Publisher Association for Computational Linguistics
Place of publication Tokyo, Japan
Publication year 2019
Published at Department of Philosophy, Linguistics and Theory of Science
Language en
Links https://www.inlg2019.com/assets/pap...
https://gup.ub.gu.se/file/207900
https://gup.ub.gu.se/file/207901
Keywords spatial descriptions, grounded neural language models, attention, representation learning
Subject categories Cognitive science, Linguistics, Computational linguistics

Abstract

Generating grounded image descriptions requires associating linguistic units with their corresponding visual clues. A common method is to train a decoder language model with attention mechanism over convolutional visual features. Attention weights align the stratified visual features arranged by their location with tokens, most commonly words, in the target description. However, words such as spatial relations (e.g. next to and under) are not directly referring to geometric arrangements of pixels but to complex geometric and conceptual representations. The aim of this paper is to evaluate what representations facilitate generating image descriptions with spatial relations and lead to better grounded language generation. In particular, we investigate the contribution of four different representational modalities in generating relational referring expressions: (i) (pre-trained) convolutional visual features, (ii) spatial attention over visual features, (iii) top-down geometric relational knowledge between objects, and (iv) world knowledge captured by contextual embeddings in language models.

Page Manager: Webmaster|Last update: 9/11/2012
Share:

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?