I think you're probably on the right track by filtering noise for consonants. They're unpitched and contain a lot of frequencies. I'm not familiar with sonic visualizer, but from my experience the displays on spectral analyzers are pretty slow and won't give you a good reading on consonants. You might just want to use your ears and get as close as possible. Channel vocoders emulate consonants by listening for sounds with high frequency content and switch over to filtered noise when it hears them. It's not perfect as it's indiscriminate about what consonants trigger the noise, but because consonants are so short they can often be passable. So just doing one better by tuning noise for each consonant might make a world of difference.
I've attached a little formant synth I made a year or so ago. It's actually meant to be sort of a "talking" synth, but it only says the vowels "a," "e," "i," "o," and "u." You press the key on your qwerty keyboard and then play some notes to hear them. It also has a silly little tune built in. I think it's hilarious ;-p.