AI generated voices are beyond my expectations

October 28, 2022

For the first time, I integrated the audio files uploaded in Wiktionary into the Wordbook. While checking some files, I found some issues.

The volumes of each audio were different. Some are loud, and some are quiet. I tried to fix the problem using software that could adjust volume levels. It did not work as I had hoped.

The biggest issue was that most words were not with audio files. I wished there were some people doing voice recordings to pronounce words or giving me recorded audio files.

Open-source Text To Speech

I had known there were AI projects for Text-to-speech. The AI was my last hope at the moment, and I started to test it.

I picked up a few open-source projects giving licenses that are easy to work with. I generated some files, and it was very impressive. I was so happy with the result. I generated thousands of files for essential words and integrated the files into the Wordbook.

The happiness I had did not last for a long time.

While I kept testing it, I found that some words were not pronounced properly. The more I tested, the more I could find those files. The AI models were trained with speeches, and it could be difficult to handle the words that were included in the training.

Although the outputs of AI projects were amazing, they were not for my project. My last hope was fading away.

Amazon Polly

I was wondering how good commercial AI could be. I do not have plenty of budget to work on the Wordbook. I was trying not to use commercial AI products as much as possible. I had not even checked the commercial ones. (I should have done that)

After a couple of research and testing the demo, I decided to use the Amazon Polly.
Here is the reason why I go for Amazon Polly.

  • The quality of the sounds was splendid, and it blew my mind.
  • It gives users to retain ownership of generated content.
  • There is a free tier.

AI-Generated Audios

I share the generated audio files here. I hope it gives you some ideas about AI-generated audio and what I mentioned earlier.

Amazon Polly

Hello
Dictionary

Open-Source

Hello
Dictionary