To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

A Comparison of Character… - University of Gothenburg, Sweden Till startsida
Sitemap
To content Read more about how we use cookies on gu.se

A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

Conference paper
Authors Wafia Adouane
Simon Dobnik
Jean-Philippe Bernardy
Nasredine Semmar
Published in Proceedings of the Second Workshop on Subword and Character Level Models in NLP (SCLeM), June 6, 2018 New Orleans, Louisiana
ISBN 978-1-948087-18-6
Place of publication New Orleans, Louisiana USA
Publication year 2018
Published at Department of Philosophy, Linguistics and Theory of Science
Language en
Links www.aclweb.org/anthology/W18-1203
https://gup.ub.gu.se/file/207490
Keywords Neural Language Model, Deep Neural Networks, under-resourced language, bootstrapping, code-switching, borrowing
Subject categories Language Technology (Computational Linguistics)

Abstract

This paper seeks to examine the effect of including background knowledge in the form of character pre-trained neural language model (LM), and data bootstrapping to overcome the problem of unbalanced limited resources. As a test, we explore the task of language identification in mixed-language short non-edited texts with an under-resourced language, namely the case of Algerian Arabic for which both labelled and unlabelled data are limited. We compare the performance of two traditional machine learning methods and a deep neural networks (DNNs) model. The results show that overall DNNs perform better on labelled data for the majority categories and struggle with the minority ones. While the effect of the untokenised and unlabelled data encoded as LM differs for each category, bootstrapping, however, improves the performance of all systems and all categories. These methods are language independent and could be generalised to other under-resourced languages for which a small labelled data and a larger unlabelled data are available.

Page Manager: Webmaster|Last update: 9/11/2012
Share:

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?