To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Estimating Language Relat… - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

Estimating Language Relationships from a Parallel Corpus. A Study of the Europarl Corpus

Conference paper
Authors Taraka Rama
Lars Borin
Published in NEALT Proceedings Series (NODALIDA 2011 Conference Proceedings)
Volume 11
Pages 161-167
ISSN 1736-6305
Publication year 2011
Published at Department of Swedish
Pages 161-167
Language en
Keywords genetic linguistics, historical linguitics, language phylogeny
Subject categories Language Technology (Computational Linguistics), Linguistics


Since the 1950s, linguists have been using short lists (40–200 items) of basic vocabulary as the central component in a methodology which is claimed to make it possible to automatically calculate genetic relationships among languages. In the last few years these methods have experienced something of a revival, in that more languages are involved, different distance measures are systematically compared and evaluated, and methods from computational biology are used for calculating language family trees. In this paper, we explore how this methodology can be extended in another direction, by using larger word lists automatically extracted from a parallel corpus using word alignment software. We present preliminary results from using the Europarl parallel corpus in this way for estimating the distances between some languages in the Indo-European language family.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?