Breadcrumb

Programming language will translate Wikipedia into 300 languages

Published

Aarne Ranta, professor at the Department of Computer and Information Technology, is working with the Wikimedia Foundation on a project called Abstract Wikipedia. The purpose of the project is to translate all Wikipedia articles into several different languages.

The goal is for more people to be able to take part in the articles on Wikipedia with the help of Professor Ranta's own translation tools and programming language called GF, Grammatical Framework.

I have set my sights on Abstract Wikipedia eventually being able to translate into 300 different languages, not far from all the languages that can be accommodated on Wikipedia's platform.

The original texts that the Abstract Wikipedia project translates are primarily created automatically from a factual database, called Wikidata. But the database also handles texts written by humans, which makes the texts more comprehensible and readable.

Isn't there a risk that the texts become too standardized and less interesting if all translations are based on the same original text?

– The articles in the different languages are linked together, which makes it possible for adjustments in the texts to be inherited by their sister articles in the other languages. And it still works as Wikipedia in general, where the texts are alive and anyone can make changes and contribute to the content, says Professor Ranta.

And according to Professor Ranta, there is an advantage having certain types of texts "written by a robot".

What are the advantages of having a text produced by a "robot"?

The texts produced by a robot are not as interesting and lively as if a person has written them. But a text robot can create large parts of a text that can be boring to write, which can then be reformulated by a human. If the texts need to follow a certain pattern or if fact checking and source references must be correct, then the facts can be reconciled with the source in a more reliable way than if a human had written the text, he says and adds.

The texts are not as interesting and lively as if a person has written them. But a text robot can create large parts of a text that can be boring to write, which can then be reformulated by a human.

What does this project contribute to?

The benefit of this project is what we call the "Wikipedia vision", which means making knowledge available to the whole world. The more indirect benefit is that the methods we develop can be used for other things as well. You can imagine that Wikipedia is among the most complicated things you can tackle, but if you manage you can use the system for an innumerable number of purposes.

How long do you think this project will last?

The project will be as ongoing as Wikipedia. There is always something new that can be developed. What is important is that you start with something that can give results quite soon, instead of thinking that "this is a huge project" that can only start delivering results after five years. I like to think that we can deliver something already that is simple but works at it is.

A prototype that I have developed together with my colleague Krasimir Angelov can create texts in 24 languages. We've tested it with different types of content, such as geographic facts and Nobel laureates, and received input from our students doing their thesis work. In the future, it is important that we can involve the large Wikipedia community by making GF accessible without extensive training in the technology.

 

By: Agnes Ekstrand and Camilla Jara