University of Gothenburg

Language Technology

Language technology is a part of Artificial Intelligence (AI) – a field of computer science developing systems capable of performing tasks that typically require human intelligence, such as learning, reasoning, and problem-solving. Language Technology (LT), in particular, is focused on understanding, interpreting, and generating human language. Language technology research is primarily conducted by researchers at Språkbanken Text unit.

Språkbanken Text

Språkbanken Text (SBX) is a part of Språkbanken, a national infrastructure for support of research based on language data. Here, open linguistic research data are developed, refined, and made accessible, with a particular focus on the Swedish language throughout history. Språkbanken Text develops and provides open digital research platforms, with the aim of supporting all types of research in which linguistic data play a central role. The unit also conducts its own research in language technology, including language-based AI, and participates in projects across other academic disciplines.

Language Technology in practice

Language Technology plays an increasingly significant role in approaches relevant to research in the humanities – linguistics, history, sociology, political science, lexicography, literary studies, education and many others. Examples of well-known practical LT applications include Google Translate, ChatGPT, Grammarly, and other widely used tools such as Siri and DeepL.

Research at Språkbanken Text

Språkbanken Text primarily works to enable computers to process human language in textual form.  When language technology is applied to and integrated with the humanities – drawing on digital tools, computational methods, and large datasets – they give rise to what is commonly referred to as the digital humanities. Språkbanken Text research is strategically combining language technology research with research in digital humanities and related fields. Examples of projects pursued at Språkbanken Text include approaches to automatic pseudonymization; privacy, biases and fairness in AI; automatic detection of lexical semantic change; AI-based analysis of language data for insights into cognitive and health-related patterns; automatic analysis of learner language. Read more in Section Current research.

Research infrastructure and tools

Language technology relies heavily on advances in machine learning and the availability of large-scale digital data across a wide range of research domains.  Research at Språkbanken Text often generates novel linguistic resources, expert annotated datasets, models and tools that are made accessible for researchers and for the general public through the language technology infrastructure maintained at Språkbanken Text.

Оn Språkbanken Text's website, you can embark on a unique and exciting linguistic adventure by searching both the large Swedish collections of text using the Korp tool, the Swedish electronic lexicon using the Karp tool and a range of other platforms.

Seminar Series 

Språkbanken Text’s research forum

This forum is alternating language technology seminars open for everyone, paper- and proposal writing sessions for researchers and  PhD students at Språkbanken, and internal discussion meetings. Seminar talks are given by invited speakers, guest researchers, or Språkbanken Text’s researchers.

See seminar program

AI for the Humanities and the Humanities for AI, HumAI

The seminars are coordinated between two language technology (LT) groups at the departments of Philosophy, Linguistics and Theory of Science (FLoV), and Swedish, Multilingualism, Language Technology (SFS). The idea behind this initiative is to promote language technology and AI research among Humanities researchers at Gothenburg University, but seminars are open to anyone outside the university as well. The HumAI seminars take place once a month.

See seminar program

Current Projects

Here you can read about some of the projects being carried out by Språkbanken Text.

 

AI-driven language biomarkers for early detection and progression of cognitive decline

Language is a cognitive function often impacted in early cognitive decline, potentially signalling earlystage dementia. Yet, despite extensive research, subtle linguistic markers in at-risk individuals remain poorly understood, highlighting the need for new investigative approaches. We propose integrating speech and language analysis with neuropsychological tests and biomarkers, using largescale, clinically validated datasets for robust, scalable analysis.

Go to the project homepage (spraakbanken.gu.se)

 

De förslavades röster: korpus-baserad diskursanalys av historiska slav-narrativ

Living standards is a key topic in economic-historical research. Enslaved individuals have been omitted from most such research, for lack of records conventionally used for studying the topic. This project studies how enslaved individuals in 19th century United States described their living standards, using a large body of records of “slave narratives” – autobiographical texts by or interviews with formerly enslaved individuals. Estimates from previous research suggest that around 5,000 “slave narratives” exist in various collections of historical records. These records are assembled into an annotated digital corpus. The analysis of the corpus consists of a machine-assisted and researcher-driven corpus linguistic discourse analysis of the narratives, focusing upon the socio-economic factors at the heart of the project, including how they described their material living standards (possession of material objects) and their non-material living conditions (trauma of coercion and violence); as well as if this was influenced by different social, cultural or geographical factors, and whether it changed over time – most importantly following emancipation from slavery.

PGo to the project homepage (spraakbanken.gu.se)

 

HUMINFRA

HUMINFRA is a new distributed national research infrastructure for the humanities, arts, and social sciences. Research in these fields is becoming increasingly interdisciplinary, quantitative, and multi-methodological. Humanities researchers, for example, use eye-tracking and keystroke logging to study schoolchildren’s reading and writing development, 3D technology and AI-based image analysis to study rock carvings, and they combine medieval maps with register data on population statistics. 

This development is leading to a growing need for technology, digital/e-science resources, tools, and expertise that complement the traditional infrastructures of libraries and archives. Today, new digital resources, databases, and digital tools are created by individual researchers and research groups at many universities across the country. However, established structures that facilitate the creation, accessibility, and use of these resources and methods are still lacking. 

Access to and information about resources in these fields, including courses and training opportunities, are scattered and difficult to find and overview. To strengthen the field nationally and internationally, HUMINFRA therefore brings together highly specialized expertise from eleven universities and organizations around e-science/digital materials and tools as well as experimental and quantitative methods for the humanities, social sciences, and the arts (HSS).

Go to the project homepage (spraakbanken.gu.se)

 

Mapping Social Stratification in the Making of Modern Argentina, 1850–1900: a Micro-Level Analysis

Social stratification has regained prominence in research, particularly in the Global South, where inequality remains a persistent challenge. However, substantive and methodological limitations constrain this scholarship’s ability to examine the causes and consequences of social stratification, often reducing inequality analyses to aggregate economic indicators that overlook spatiality and multidimensionality. 

This project addresses these challenges by investigating social stratification in Argentina between 1850 and 1900, a pivotal period of development for a country that was once among the world's wealthiest but is now marked by high inequality and economic instability. Leveraging advanced OCR techniques to digitise individual-level census data, the project reconstructs multidimensional indicators of social stratification—such as occupational structure, literacy, social mobility, labour participation, and partner selection—at both regional and national levels over time. This approach overcomes data limitations and, by incorporating intersecting factors such as gender, age, and migratory status, provides new insights into the evolution of social stratification. 
 
Furthermore, the project introduces a flexible, reproducible framework for extracting and analysing handwritten tabular data, offering a methodological contribution applicable to historical datasets worldwide.

Go to the project homepage (spraakbanken.gu.se)

 

Grandma Karl is 27 years old: Automatic pseudonymization of research data

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the presence of personal and sensitive information, e.g names, political opinions. GDPR suggests pseudonymization as a solution, but we need to learn more about it before adopting it for manipulation of research data. 

This environment targets several aspects of pseudonymization, aiming to advance Sweden's work on open access to research data:  

  1. algorithms to automatically detect, label and pseudonymize personal identifiers in freely written texts (essays/blogs), focusing on linguistic challenges such as spelling errors, ambiguous entities, semantic constraints etc 

  2. analysis of type and number of personal identifiers versus acceptable protection, followed by reidentification tests to ensure that pseudonymization is effective 

  3. analysis of the effects of pseudonymization on research data, e.g on the readability of the resulting texts, their utility for answering the intended research questions and applicability to practical scenarios (e.g language assessment)

Go to the project homepage (github.io)

 

Svenska Akademiens samtidsordböcker

Within the framework of the project, the Swedish Academy’s lexical database (Salex) is maintained and further developed. In addition, work is carried out on the Swedish Academy’s two contemporary dictionaries: Svenska Akademiens ordlista (SAOL) and Svensk ordbok utgiven av Svenska Akademien (SO). This work is conducted on behalf of and in collaboration with the Swedish Academy. 

Both works are described in detail on the respective websites of SAOL and SO. 

Together with the Swedish Academy Dictionary (SAOB), SAOL and SO are, among other places, available on the Swedish Academy’s dictionary portal svenska.se.

Go to the project homepage (spraakbanken.gu.se)

 

Change is Key!

Change is Key! is a six-year research program (spanning 2022–2027) dedicated to developing computational tools that trace how language, society, and culture evolve over time. We develop corpus-based methods for detecting semantic change and semantic variation, and apply these tools to large-scale textual data to uncover the dynamics of linguistic, societal, and cultural change—both historically and in contemporary settings. By collaborating closely with scholars in the social sciences, gender studies, and literary studies, we address field-specific research questions and enable interdisciplinary insights through computational methods. 

Go to the project homepage (spraakbanken.gu.se)