Jey Han Lau
|Published in||Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, July 26-31 2015|
|Keywords||computational linguistics, natural language processing|
|Subject categories||Computer and Information Science|
In this paper we present the task of un- supervised prediction of speakers’ accept- ability judgements. We use a test set generated from the British National Corpus (BNC) containing both grammatical sentences and sentences containing a va- riety of syntactic infelicities introduced by round trip machine translation. This set was annotated for acceptability judgements through crowd sourcing. We trained a variety of unsupervised language mod- els on the original BNC, and tested them to see the extent to which they could pre- dict mean speakers’ judgements on the test set. To map probability to acceptability, we experimented with several normalisation functions to neutralise the effects of sentence length and word frequencies. We found encouraging results with the unsupervised models predicting acceptability across two different datasets. Our method- ology is highly portable to other domains and languages, and the approach has po- tential implications for the representation and the acquisition of linguistic knowledge.