With regard to formant control in time domain pitch shifting... I can't claim to be an expert here, and I haven't kept up with my reading in this area...
But if you look at the way formant synthesis was done in 70s and 80s computer music, it's usually playing windowed sinusoids (synchronized to the sound's fundamental frequency), where the frequency of the sinusoid determines the formant's center frequency, the length of the window determines the width of the formant, and of course the period of the window playback determines the fundamental frequency (F.06, etc.). And similarly you can do this (formant compressing/stretching) with any arbitrary single cycle wavetable, by stretching or compressing it in time, independent of its playback period. In this case, the waveform is equivalent to an FIR filter kernel (which may be arbitrarily stretched) that's filtering an impulse train.
So shouldn't it basically be the case, then, that if you have period-synchronous windowed portions of the input signal, that you can change the formant by stretching/shrinking the original signal in time, and change the pitch by changing the period of playback (i.e. fundamental frequency)?
This should be more or less equivalent to the smoothed cepstrum/whitened spectrum method of frequency domain formant control (timbre stamp, etc.), if you think of the spectrum of the zero-padded "kernel" (one period of the original signal) as the smoothed cepstrum, and the variable playback period "impulse train" as the whitened spectrum.
Roland also have some nice patents on this stuff, I think for their VP-9000 and V-Synth.
Of course this probably turns into a disaster on polyphonic or non-periodic signals (like other pitch shifters of this type)...