The Criminal Investigation Department was set up in the 1850s and tasked with conducting investigations as non-uniformed police officers. In 1868, the department established a permanent team of personnel, and it was at this point that “report books” began to be kept, containing copies of all outgoing reports from the department. The image shows the HTR processing of a police report from 1896.
Automatic transcription using machine learning
In recent years, research has made huge advances in the automated transcription of old, handwritten texts. The method of Handwritten Text Recognition (HTR) is based on artificial intelligence, whereby the computer programme is trained to understand and interpret the figures in an image and translate them into characters. In the CID project, an extensive collection of police reports will be processed using HTR methods that will automatically transcribe the handwritten text into a digital format.
The content of the report books offers enormous potential for many archive users. The results of the project will also be made freely available to everyone via the Swedish National Archives’ Digital Research Room. Several GPS400 projects, including CID, will use the transcribed archival information as contextual information and metadata for photographs from the same period. Searching for and using information from this transcribed archive material will be very different from the traditionally time-consuming task of searching through analogue archives. It will be possible, for example, to conduct a free text search of the entire archive and quickly process large quantities of data. The Swedish National Archives are also running a Vinnova-funded project to expand the use of HTR and to create a search interface for the transcribed data.
Now you can join in!
Spring 2020 saw the manual transcription of 500 two-page spreads by a group of participants drawn from the general public. Based on this transcribed data, an HTR model was then trained to conduct automatic transcription of the remaining material with 97 per cent accuracy. However, to achieve as high a quality as possible, the project is once again inviting the public to take part, this time in the correction of the automatically transcribed police reports. The work will take place over the internet and requires no prior knowledge, beyond normal computer skills and some experience of reading texts from the turn of the 20th century. Participating in the project will give you an insight into the Gothenburg of this period and how its people lived. You will also be making a major contribution to future research, creating new opportunities to generate knowledge about the history of the city. Are you interested in helping out? Email email@example.com and we’ll tell you all about it. Come and join in!