To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Learning to compose spati… - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

Learning to compose spatial relations with grounded neural language models

Authors Mehdi Ghanimifard
Simon Dobnik
Published in Second International Workshop on Models and Representations in Spatial Cognition (MRSC). Schloss Hohentübingen; Tübingen, Germany; April 06 - 07, 2017
Publication year 2017
Published at Department of Philosophy, Linguistics and Theory of Science
Language en
Keywords spatial cognition, machine learning, deep learning, neural networks, language model, grounding, language, perception
Subject categories Cognitive science, Linguistics, Computer science, Computational linguistics


Neural language models are common in recent applications of neural networks in machine translation, speech recognition, and image captioning. Vector representation of linguistic units (such as words) are inputs to these models which learn to generate meaningful compositions of them. The intermediate composed symbolic representations (word vectors) can also be grounded in meaningful composed representations of another modality (images, or sensors). An interesting question is what is the correspondence between linguistic compositions and compositions in another modality. Common evaluation metrics do not express sufficiently the performance of different neural models in learning compositions. They tell us about the performance of the learned representations on the evaluation dataset but they do not say anything about the internal structure of these representations. The question we want to answer is to what degree neural language models learn how to compose grounded spatial expressions and to what degree the learned model corresponds to the model that generated the composed data. We produce a synthetic dataset of composed spatial templates corresponding to composed linguistic expressions using a known compositional function. We use simple spatial templates from Logan and Sadler (1996). Our neural network learns composed spatial templates from individual locations of target-landmark pairs incrementally grounded in the sequence of words that have been passed to it. To evaluate the performance of the network we compare generations from different setups of grounded neural language models with the original composed templates and consider their similarity with the (known) function that produced them.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?