To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Integrating language reso… - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

Integrating language resources in two OCR engines to improve processing of historical Swedish text.

Authors Dana Dannélls
Leif-Jöran Olsson
Published in CLARIN Annual Conference
Publication year 2018
Published at Department of Swedish
Language en
Keywords OCR, Historical Swedish text, Language models.
Subject categories Other Humanities not elsewhere specified, Language Technology (Computational Linguistics)


We are aiming to address the difficulties that many History and Social Sciences researchers struggle with to bring in non-digitized text into language analysis workflows. In this paper we present the language resources and material we used for training two Optical Character Recognition engines for processing historical Swedish text written in Fraktur (blackletter). The trained models, resources and dictionaries are freely available and accessible through our web service, hosted at Språkbanken, to enable users and developers easy access for extraction of historical Swedish text a that are only available in images for further processing.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?