To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Exploring the Quality of … - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

Exploring the Quality of the Digital Historical Newspaper Archive KubHist

Conference paper
Authors Yvonne Adesam
Dana Dannélls
Nina Tahmasebi
Published in Proceedings of the 4th Conference of The Association Digital Humanities in the Nordic Countries (DHN), Copenhagen, Denmark, March 5-8, 2019 / edited by Costanza Navarretta, Manex Agirrezabal, Bente Maegaard
Publisher University of Copenhagen, Faculty of Humanities
Place of publication Copenhagen
Publication year 2019
Published at Department of Literature, History of Ideas, and Religion
Department of Swedish
Language en
Keywords Historical newspaper corpus OCR errors Spelling normalization
Subject categories Language Technology (Computational Linguistics)


The KubHist Corpus is a massive corpus of Swedish historical newspapers, digitized by the Royal Swedish library, and available through the Språkbanken corpus infrastructure Korp. This paper contains a first overview of the KubHist corpus, exploring some of the difficulties with the data, such as OCR errors and spelling variation, and discussing possible paths for improving the quality and the searchability.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?