To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Evaluation and refinement… - University of Gothenburg, Sweden Till startsida
Sitemap
To content Read more about how we use cookies on gu.se

Evaluation and refinement of an enhanced OCR process for mass digitisation.

Conference paper
Authors Dana Dannélls
Torsten Johansson
Lars Björk
Published in Proceedings of the Digital Humanities in the Nordic Countries 4th Conference (DHN 2019), Copenhagen, Denmark, March 5-8, 2019. Edited by: Costanza Navarretta, Manex Agirrezabal, Bente Maegaard
Publisher University of Copenhagen, Faculty of Humanities
Place of publication Copenhagen
Publication year 2019
Published at Department of Swedish
Language en
Links ceur-ws.org/Vol-2364/
Keywords Language Technology OCR Digitisation
Subject categories General Language Studies and Linguistics, Language Technology (Computational Linguistics)

Abstract

Great expectations are placed on the capacity of heritage institutions to make their collections available in digital format. Datadriven research is becoming a key concept within the humanities and social sciences. Kungliga biblioteket’s (National Library of Sweden, KB)collections of digitised newspaper can thus be regarded as unique cultural data sets with information that rarely is conveyed in other media types. The digital format makes it possible to explore these resources in ways not feasible while in printed form. As texts are no longer only read but also subjected to computer based analysis the demand on the correct rendering of the original text increases. OCR technologies for converting images to machine-readable text play a fundamental part in making these resources available, but the effectiveness vary with the type of document being processed. This is evident in relation to the digitisation of newspapers where factors relating to their production, layout and paper quality often impair the OCR production. In order to improve the machine readable text, especially in relation to the digitisation of newspapers, KB initiated the development of an OCR-module where key parameters can be adjusted according to the characteristics of the material being processed. The purpose of this paper is to present the project goals and methods.

Page Manager: Webmaster|Last update: 9/11/2012
Share:

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?