Hi all,
I was working on some speech synthesis in PD by emulating spectral analysis of speech. This was doable. But I got curious about emulating other types of speech, like screaming and singing.
Any advice on how to accomplish this? I used fft~ to analyze my own screams, and did some searches for analysis already done online. There were a couple papers available.
I made some experiments to simulate screaming already. The fft showed that unlike a normal speaking voice which shows only a few spectral peaks and is convincingly mimicked with a few bp filters, spectral peaks for screaming are active throughout the audible spectrum.
First I tried setting up a regular formant synth with higher fundamental tones, setting the gain higher on the partials and adding in some noise. This sounded too much like whispering.
Second I created parallel bp filters for the noise to try and make it less airy and more direct/forceful sounding.
Third I tried to skip the noise entirely and simply add more [phasor~] objects where I saw partials. There had to be so many that they tended to cancel out or sound way too distorted to be like a speaking or screaming voice.
I'm open to any kind of response, theoretical or practical. Just can't wrap my head around the problem yet.