Skip to main content

The superdictionary, its objections, and irresistible tools

Culture and languages

Arvi Tavast, Institute of the Estonian Language, visits the higher seminar to talk about the Ekilex project. The long-term goal of the project is to actually join dictionaries into a single superdictionary, as opposed to linking between dictionaries or aggregated search across multiple dictionaries.

1 Feb 2021
13:15 - 15:00
Online via Zoom

Arvi Tavast, director of Institute of the Estonian Language
Institutionen för svenska språket

The underlying assumption is that users look for information about words, not about dictionaries, which means that the current system of duplicated and conflicting dictionaries has no real use case. Timing of the project also coincides with the rise of automated, corpus-based processes to replace introspective lexicography.

Despite a consensus about user benefits, Ekilex has met active resistance from lexicographers, due to four interconnected reasons:

  • Bringing a legacy dictionary into a structured database exposes its internal conflicts, previously hidden in disconnected articles, which is not a pleasant sight for the authors.
  • Resolving these data quality issues is still largely manual lexicographic work, with a volume that looks daunting if not unrealistic.
  • Authors feel uncomfortable with allowing uncertainty in a dictionary and trusting readers to draw their own conclusions.
  • Neither do authors trust each other enough to cooperate on a shared dictionary.

To overcome these objections, we aim to provide specialised tools that deliver so tangible benefits for lexicographers, moving processes towards more automation, that they can't be resisted. Development of the system is ongoing and iterative, so the situation will have changed by March, but in the presentation I'll be able to report on at least three and a half tools: two relatively straightforward batch processes (word joiner and meaning joiner), and a specialised tool for synonyms that is being extended to equivalents.

As part of the background I'll also describe the Ekilex data model, which is new in the sense of expressing m:n relations between words and meanings, instead of the traditional 1:n of semasiology and m:1 of onomasiology.