To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

From the paft to the fiit… - University of Gothenburg, Sweden Till startsida
Sitemap
To content Read more about how we use cookies on gu.se

From the paft to the fiiture: A fully automatic NMT and word embeddings method for OCR post-correction

Conference paper
Authors Mika Hämäläinen
Simon Hengchen
Published in International Conference Recent Advances in Natural Language Processing, RANLP
ISSN 13138502
Publication year 2019
Published at Department of Swedish
Language en
Subject categories Language Technology (Computational Linguistics), Computer and Information Science

Abstract

A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction.

Page Manager: Webmaster|Last update: 9/11/2012
Share:

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?