Kdenlive Speech to Text Tool. This is my experience

Last week, Pablinux told you about the new version of Kdenlive, the video editing tool from the KDE project. As I once commented, I prefer OpenShot which has a lower learning curve, butAs I was very interested in the speech-to-text tool that this new version incorporates, I decided to take a look at it.

Although I have written my share of articles on Linux alternatives to this or that Windows program (No one can call themselves a Linux blogger if they didn't write one of those), this is not an approach that I like. I think that programs should be talked about by their own characteristics. If I have to define Kdenlive in any way, I will say that it is a video editor for hobbyists who want their creations to look professional.

I've said in the past and I keep it (come one by one) that free and open source software has libraries for multimedia work that make Adobe and Blackmagic products look like mere toys. The big problem is that nobody was interested in putting these tools together with a simple and attractive interface and complete and easy to understand documentation. Although Kdenlive is far from having achieved its goal, its developers are on the right track.

In the case of the ability to convert speech to text, Kdenlive uses two tools from the arsenal of the repository of the Python Package Index.

Vosk is an open source and offline speech recognition toolkitn. It offers speech recognition models for 17 languages and dialects: English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, and Filipino.

Kdenlive uses Vosk models through a module written in Python.

However, having the transcript is not enough. You also have to sync it with the video. For this we need another module in Python for creating subtitles.

Kdenlive will check that you have these modules installed. PTo do this you need to first install the python3-pip package on your distribution and then run the commands:

pip3 install vosk

pip3 install srt

Next, we have to install the voice models. For this we open Kdenlive and we are going to Settings Configure Kdenlive Speech to Text.

To load the models you have two options: or download the models from this page and load them manually (You must first check the Custom modem folders box) or paste the link from the list that shows you that same page.

Using the Speech to Text tool

Make sure in the View menu that you have the subtitles option activated. Next, upload the video you want to transcribe.
Move the video to the first video track and slide the blue line along the length you want to transcribe.
Click on the subtitles tab and then on the + sign
A hint is added at the top. Click on the icon to the left of the eye.
Select the transcription model and if you want to transcribe a clip, all the clips in a timeline or a part of the timeline. Click on Process

I compared Speech to tech to the free version of a cloud tool, and have seen self-captioned videos from Youtube and paid course platforms. I have to say that it is not perfect, but it is not worse than the mentioned alternatives. He has problems when those who speak do not have good diction or do so over music or some other sound. But, imagining the question they are asking me, yes, it can be used to subtitle a series or movie. Although, due to the limitations indicated, they may have to be completed by hand.

And, if the guys at Kdenlive put the batteries a bit and integrate a translation module, the thing would be perfect.

There is something that could be improved. Today, if you want to change the appearance of the subtitles, you will have to insert code. And, there is no way to export them. You will only be able to see them embedded in the video.

But, as I said above, without a doubt the project is on the right track.

LinuxAdictos

Kdenlive's Speech-to-Text Tool. This is my experience

Using the Speech to Text tool

Leave a Comment Cancel reply