Speech Formant Synthesizer (With bonus Turkish vowels pack :P)

jimqode

Here is my first shot at a speech formant synthesizer. It only does the vowels for now because I couldn't yet figure out how to extract the envelopes needed for consonants from my recordings. Also I don't know if just a simple envelope will be enough for them. Any pointers to speech synthesis online resources on the subject will be greatly appreciated.

Formants are completely customizable on text files, so it should be straightforward to adapt it for another language.

All comments (even flame :P) welcome.

http://www.pdpatchrepo.info/hurleur/formant.zip

ultrasonic

Thanks, I'm quite amazed how clear the vowels are - excellent! Sorry I can't help you with further links about speech synthesis beyond the ones you would find with google yourself.
Also, formants for consonants? I was living with the impression that formants exclusively describe vowels. I also thought so far that in speech synthesis they use filtered noise for the various consonants.

jimqode

Thank you! I'm not too happy with the "u" vowel but the others sound quite clear to me too.
I am also trying to get the consonants by filtering a noise source with filters using vline~ envelopes. I was able to analyse the formant frequencies using a free software called sonic visualizer, but I couldn't quite grasp how to analyze the consonants with it. There are quite good tutorial on the web for formant synthesis, but I couldn't find anything about consonant synthesis. So i'm playing with prosody right now until I find a good resource.

Maelstorm

I think you're probably on the right track by filtering noise for consonants. They're unpitched and contain a lot of frequencies. I'm not familiar with sonic visualizer, but from my experience the displays on spectral analyzers are pretty slow and won't give you a good reading on consonants. You might just want to use your ears and get as close as possible. Channel vocoders emulate consonants by listening for sounds with high frequency content and switch over to filtered noise when it hears them. It's not perfect as it's indiscriminate about what consonants trigger the noise, but because consonants are so short they can often be passable. So just doing one better by tuning noise for each consonant might make a world of difference.

I've attached a little formant synth I made a year or so ago. It's actually meant to be sort of a "talking" synth, but it only says the vowels "a," "e," "i," "o," and "u." You press the key on your qwerty keyboard and then play some notes to hear them. It also has a silly little tune built in. I think it's hilarious ;-p.

http://www.pdpatchrepo.info/hurleur/formantsynth.mmb.zip

ichabod

I think the hard part is making the consonant-vowel and consonant-consonant transitions sound intelligible. But if

, it certainly can be done in Pd (and I want to know if anyone's done it).

I like the buzzy robotic sound that this implementation has.

mynamewontfitin

You can usually take consonants like this as being filtered noise, with a very quick transition into whatever vowel comes next. I think the centre frequency of the noise and the transition is what defines individual consonants from one another.
I found this page quite helpful:

https://ccrma.stanford.edu/CCRMA/Courses/152/speech_recognition.html

ShawnPD

yea! 8 ]

Navanod

the thread has been quiet for some time but thought Id also like to thank you for the formant sample pd files..really useful and hopefully once I get back into the pd interface I can make proper use of them. One question I have about jimqode's example is I seem to get some clipping artifacts when changing vowels and was wondering if there is a way to remove it? I added a little lop~ which seems to help a little but only with lower frequency settings. I'm running on asio drivers and so at least I hope I can rule out my soundcard (LYNX).

I'm working on a kind of random vowel generator so getting the vowels to flow in a speechy way but not understandable is what im heading for.

atux

One could add consonants and make words sound. I don't know if anyone has ever developed this in pd. I think it's extremely complex. It would be very interesting, because the words could be intoned (sung). So get a sung text.
a.

atux

I found the open-source code VocalTractLab.
It would be interesting to create some c++ libraries for pd.