To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Sparv: Språkbanken’s corp… - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

Sparv: Språkbanken’s corpus annotation pipeline infrastructure

Conference contribution
Authors Lars Borin
Markus Forsberg
Martin Hammarstedt
Dan Rosén
Roland Schäfer
Anne Schumacher
Published in SLTC 2016. The Sixth Swedish Language Technology Conference, Umeå University, 17-18 November, 2016
Publication year 2016
Published at Department of Swedish
Language en
Keywords corpus linguistics, lexical analysis, compound analysis, automatic annotation
Subject categories Language Technology (Computational Linguistics)


Sparv is Språkbanken's corpus annotation pipeline infrastructure. The easiest way to use the pipeline is from its web interface with a plain text document. The pipeline uses in-house and external tools on the text to segment it into sentences and paragraphs, tokenise, tag parts-of-speech, look up in dictionaries and analyse compounds. The pipeline can also be run using a web API with XML results, and it is run locally at Språkbanken to prepare the documents in Korp, our corpus search tool. While the most sophisticated support is for modern Swedish, the pipeline supports 15 languages.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?