Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän
Utskriftsversion

A Comparison of Character… - Göteborgs universitet Till startsida
Webbkarta
Till innehåll Läs mer om hur kakor används på gu.se

A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

Paper i proceeding
Författare Wafia Adouane
Simon Dobnik
Jean-Philippe Bernardy
Nasredine Semmar
Publicerad i Proceedings of the Second Workshop on Subword and Character Level Models in NLP (SCLeM), June 6, 2018 New Orleans, Louisiana
ISBN 978-1-948087-18-6
Förlagsort New Orleans, Louisiana USA
Publiceringsår 2018
Publicerad vid Institutionen för filosofi, lingvistik och vetenskapsteori
Språk en
Länkar www.aclweb.org/anthology/W18-1203
https://gup.ub.gu.se/file/207490
Ämnesord Neural Language Model, Deep Neural Networks, under-resourced language, bootstrapping, code-switching, borrowing
Ämneskategorier Språkteknologi (språkvetenskaplig databehandling)

Sammanfattning

This paper seeks to examine the effect of including background knowledge in the form of character pre-trained neural language model (LM), and data bootstrapping to overcome the problem of unbalanced limited resources. As a test, we explore the task of language identification in mixed-language short non-edited texts with an under-resourced language, namely the case of Algerian Arabic for which both labelled and unlabelled data are limited. We compare the performance of two traditional machine learning methods and a deep neural networks (DNNs) model. The results show that overall DNNs perform better on labelled data for the majority categories and struggle with the minority ones. While the effect of the untokenised and unlabelled data encoded as LM differs for each category, bootstrapping, however, improves the performance of all systems and all categories. These methods are language independent and could be generalised to other under-resourced languages for which a small labelled data and a larger unlabelled data are available.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11
Dela:

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?