AI transcribes Norwegian speech almost as well as humans

AI transcribes Norwegian speech almost as well as humans
AI transcribes Norwegian speech almost as well as humans
--

In Kav Bergen we said “Hallaien Tjommi, ke de’ går i?”. This was the result.

Camera Vilde M. Horvei / Tek.no

Vilde M. Horvei
Save

Speech recognition based on artificial intelligence now understands Norwegian speech so well that it can transcribe almost as well as humans.

A test carried out by the Language Bank at the National Library shows that factors such as dialect, gender and background noise have little to do with the quality of the transcriptions. These are things that would previously have negatively affected speech recognition.

– It surprised us how well the systems handled this, says language technologist at the National Library, Marie Iversdatter Røsok, to Forskning.no.

She highlights new technology and better access to Norwegian training data as crucial for development. Nevertheless, there are still things that are challenging for this type of transcription.

– What remains, and which drags the results down, is overlapping speech. This means normal conversations where several people talk freely together, she says.

Shuffles the sentence structure

The National Library’s language models, NB-Whisper, are based on OpenAI’s Whisper technology. So the same as ChatGPT. It is also these who deliver the best results in both Bokmål and Nynorsk.

These systems transform even unstructured speech into shorter, grammatically correct sentences, which, according to Røsak, reproduce the meaning well. She believes this is revolutionary, as the systems not only transcribe what you say, but what you mean.

She highlights an example from the test, where this sentence was uttered:

– And then my brother and dad were in there with mum, and then they called after an hour and said that now you have to come.

According to NB-Whisper’s transcription, the sentence read as follows:

– My brother and dad were with mum. They called after an hour and said I had to come.

In other words, the language model has reshuffled the sentence structure, and converted spoken language into written language in a natural way.

Can make it easier for people with disabilities

As the system not only reproduces content word for word, but shortens and shuffles the sentence, Røsak believes that this also makes automatic subtitling possible.

In addition to the fact that this will free up a number of resources, Røsak also believes that this will benefit people with disabilities.

– It is both with regard to how much material is subtitled and how quickly they will be able to access it, says Røsok.

Tek tests – in dialect

It went

It went “so there” well when the language model had to interpret what our boss from Skjåk was trying to say.

After so much bragging, we naturally also had to test the language model for the National Library. We therefore made a very simple, yet challenging attempt to test the language model’s ability to interpret Kav Bergen. It went “so there”.

A very Bergen, but also frequently used phrase in the rainy city reads as follows: “Hallaien, Tjommi! Where are they going?”. Translated into Norwegian, it roughly means “Hello mate, how are you?”.

The National Library’s language model managed the transcription only OK: “Hallaien, Tjommi, what’s going on?”. In other words, it is able to transcribe word for word, and hears the Bergen pronunciation quite well. But it apparently cannot recognize sentence context which is city- and dialect-specific jargon yet.

When we went a little extra, and tested recognition with dialect from Skjåk, it went even worse. The actual sentence reads as “Du æ ei jælma luggum jente”, which means “you are a very pleasant/good girl”.

The answer we got, on the other hand, was verbatim, but not correct: “You are a helmeted luggum girl”.

You can hear the soundtracks below.

Here you will find the language models

A demo of NB-Whisper is open to everyone, and it is also available to anyone developing their own apps for speech recognition. Such as VG’s own Jojo, which anyone and everyone can download for Mac for free.

The article is in Norwegian

Tags: transcribes Norwegian speech humans

-

PREV Norway’s most innovative young entrepreneurs meet at the NM.
NEXT New construction and remodeling in Haugesund on the agenda