Upsampling audio signals using automatable sampling frequency?

acreil

With regard to formant control in time domain pitch shifting... I can't claim to be an expert here, and I haven't kept up with my reading in this area...

But if you look at the way formant synthesis was done in 70s and 80s computer music, it's usually playing windowed sinusoids (synchronized to the sound's fundamental frequency), where the frequency of the sinusoid determines the formant's center frequency, the length of the window determines the width of the formant, and of course the period of the window playback determines the fundamental frequency (F.06, etc.). And similarly you can do this (formant compressing/stretching) with any arbitrary single cycle wavetable, by stretching or compressing it in time, independent of its playback period. In this case, the waveform is equivalent to an FIR filter kernel (which may be arbitrarily stretched) that's filtering an impulse train.

So shouldn't it basically be the case, then, that if you have period-synchronous windowed portions of the input signal, that you can change the formant by stretching/shrinking the original signal in time, and change the pitch by changing the period of playback (i.e. fundamental frequency)?

This should be more or less equivalent to the smoothed cepstrum/whitened spectrum method of frequency domain formant control (timbre stamp, etc.), if you think of the spectrum of the zero-padded "kernel" (one period of the original signal) as the smoothed cepstrum, and the variable playback period "impulse train" as the whitened spectrum.

Roland also have some nice patents on this stuff, I think for their VP-9000 and V-Synth.

Of course this probably turns into a disaster on polyphonic or non-periodic signals (like other pitch shifters of this type)...

katjav

@acreil said:

But if you look at the way formant synthesis was done in 70s and 80s computer music, it's usually playing windowed sinusoids (synchronized to the sound's fundamental frequency), where the frequency of the sinusoid determines the formant's center frequency, the length of the window determines the width of the formant, and of course the period of the window playback determines the fundamental frequency

That must be it.

Intuitively, the VOSIM voice simulator was on my mind all the time when thinking about formant-preserving pitch shifting, but I didn't see the connection till I suddenly realized the analogy with the windowed periods in Gibson's method, last night when I actually tried to get asleep.

VOSIM was developed by people at the Sonology Institute (then based in Utrecht, Holland) in the seventies as a function generator, and later translated to digital form as part of a computer music program by Gottfried Michael Koenig.

I've played around with the analog VOSIM when following courses at the Sonology Institute a couple years ago. It was my favourite module, because of the strong formant character. I never made any electronic compositions in the analog studio, like my fellow students did and which they proudly presented in the conservatory's pretentious concert hall. Instead, I was staring for hours at the huge four-channel oscilloscope and trying to understand relations between wave shapes and sound character.

It may be possible to emulate VOSIM in Pd. It is basically a raised sine train with a user-selectable amount of cycles and width (frequency), windowed at the end only, which you can play at an arbitrary speed to get the pitch.

A similar experiment could be done with a single period cut out of a recorded voice. The we could evaluate the effect of Lent's method before developing the whole complicated analyzer thing. The challenge (in Pd) is, how to control a table playback interval independently from the playback rate, without relying on metro or other control rate objects.

Katja

mod

if you do a search, i think there is an existing VOSIM patch in pd somewhere.

edit: here: http://markmail.org/message/qwolxnsxinaazaso

elden

VOSIM has at least two components:
a signal at a certain frequency >triggered within and synchronous with a window< - giving the spectrum of the sound
and >the overall frequency at which this windowed signal is triggered< - giving the fundamental frequency.

I've got 2 other questions regarding the topic:
1.) is windowing causing amplitude modulations when being triggered at/by zero crossings? if not, could that cause formant stretching or compression without granular artifacts?
2.) is upsampling stabilizing a signal's spectrum when being pitched? if yes, would downsampling (to the original rate) of a formerly upsampled and pitched signal recreate the original formants?

katjav

Yes the vosim patch by Frank Barknecht clearly shows the effect, because it has a very strong sense of formant with some settings. And then you can change the pitch without changing the formant. Thanks for the link, mod.

Windowed overlap of audio segments taken at different positions in time always causes amplitude modulations if there is a harmonic recipe involved. That is because when one harmonic is in phase at the overlap, another harmonic is out of phase at the overlap. So I am not shure what will happen when the periods will be exactly cut at length and overlapped. It's a matter of trying out in a Pd patch.

Katja

elden

@katjav said:

Yes the vosim patch by Frank Barknecht clearly shows the effect, because it has a very strong sense of formant with some settings. And then you can change the pitch without changing the formant.

Yes, but the wave cycle that is played back windowed in VOSIM is >artificially synthesized< and >separately triggered< for every single windowed impulse and eventually there's a silence gap of adjustable length between every impulse that is influencing formants, too. How would you turn that functioning into a working algorithm for the use with an incoming stream of audio?

acreil

Here are the Roland patents:

http://www.google.com/patents/US6201175?printsec=abstract&dq=6201175&ei=JwMTT_3MNanW0QHl0Z27Aw#v=onepage&q=6201175&f=false
multi-band something or other

http://www.google.com/patents/US6564187?printsec=abstract&dq=6564187&ei=XgMTT6GwGaXV0QHM_tyPAw#v=onepage&q=6564187&f=false
multi-rate processing

http://www.google.com/patents/US6721711?printsec=abstract&dq=6721711&ei=GQQTT_mXCIfv0gHNi7GiAw#v=onepage&q=6721711&f=false
timescale-pitch stuff, includes stuff about formants

http://www.google.com/patents/US20040144237?printsec=abstract&dq=10_719_872&ei=0AQTT6HXCeTm0QGE86SmAw#v=onepage&q&f=false
looks like a general overview

I don't know how useful or relevant this stuff is. I only briefly skimmed it. And there could be more.

acreil

@elden said:

Yes, but the wave cycle that is played back windowed in VOSIM is >artificially synthesized< and >separately triggered< for every single windowed impulse and eventually there's a silence gap of adjustable length between every impulse that is influencing formants, too. How would you turn that functioning into a working algorithm for the use with an incoming stream of audio?

Instead of the windowed sinusoid, you'd be playing one period of the original input signal, without scaling it. The result should be smooth if the endpoints are zero. The zero padding you get from the silence gap effectively interpolates and resamples the harmonics of the input waveform to make a new signal with a scaled pitch but the same formant.

elden

the problem regarding a kind of input-stream-VOSIM system is, that what we would get is formant filtering, like you could do it with a dedicated formant or spectral filter, not an increasing or decreasing of the distance between the harmonics relatively to each other.
I think we must think of formant shifting in kinds of spectrum stretching or shrinking along the frequency domain, the way samplers do the chipmunk effect.
the only issue is, how to move the overtone spectrum around a fixed fundamental frequency NOT using any FFT.
I think that's where the upsampling comes in, anyhow.

i just found out that - using usual granular pitch shifting - if you use smaller grain length than the input fundamental wave cycle, the fundamental frequency stays the same while the rest of the spectrum shifts (but doesn't stretch along the frequency axis). Maybe that's caused by an effect similar to what VOSIM does...sounding extremely cheap.

katjav

@acreil said:

Instead of the windowed sinusoid, you'd be playing one period of the original input signal, without scaling it. The result should be smooth if the endpoints are zero.

Cutting a signal at a zero crossing does not per se make it smooth. Soon as you cut periods apart and insert zero's inbetween, you'll be confronted with the effects of a rectangular window. This creates extra frequencies all over the spectrum, aliases included. In Gibson's method, a Hann window is used on the periods. This makes the cuts smoother, but still introduces new frequencies: the products of the input frequencies and the cosine term in the window.

Normally when Hann windowing is used for Fourier transform, the overlap exactly undoes the product frequencies, because the amplitude of two (or four, etc.) Hann windows is a constant. Apart from possible amplitude scaling, the output is equal to the input if you don't do anything to the spectrum.

In Gibson's method the intention is to stretch or compress the rate at which windowed segments are sent to the output. In that case, the amplitude of overlapping Hann windows can not be a constant. And it must not be a constant, because the net effect of the whole operation would then be zero. So there will be amplitude modulations which generate extra frequencies, and the question is still whether these extra frequencies fit in the harmonic recipe an do not spoil the result.

@acreil said:

The zero padding you get from the silence gap effectively interpolates and resamples the harmonics of the input waveform to make a new signal with a scaled pitch but the same formant.

That is a crucial observation. A similar effect holds for signal tails folding back on itself, the complement to zero padding. Zero padding and fold over both do alias-free resampling.

I am trying to model variable trigger rate of a period stored in an array to make the effects audible. A first patch is LentModel01.pd (attached), where a single period of a windowed or non-windowed sinewave is triggered at variable rate. The patch makes clear where the problems are. 'Rectangular windowing', i.e. no windowing, makes a zero padded sine sound like a sawtooth rather. The Hann window produces a sine with harmonics right from the start. I've not implemented overlap yet.

It seems that the copy-and-paste of the periods should be done symmetrically round the center of the period, like you would do with a zero-padded or fold-over Fourier transform. That is for the next patch then.

Katja

http://www.pdpatchrepo.info/hurleur/LentModel01.pd

acreil

I was thinking that generating extra frequencies was the crucial step. Imagine the fourier series of a periodic square waveform (1/f, odd harmonics, etc.) compared to the fourier transform of an aperiodic pulse (sinc). It becomes obvious that the periodic square waveform samples the sinc, and by sampling it at different rates, you can scale the "formant"(i.e. change the pulsewidth). So if you were to use this as a constant formant (i.e. constant pulse duration, regardless of frequency), you could have a 50% duty cycle pulse wave at C3 and a 25% duty cycle pulse wave at C2. The sinc function is constant and you're just changing the points where it's sampled. But at C4 you'd get a 100% duty cycle pulse wave, which is just DC. This isn't a problem, it just means you've sampled only the null points of the sinc function.

Anyway, if the idea is to obtain a constant formant, you want to construct the sinc from the 1/f spectrum, so you have to end up with extra frequencies. You won't get the correct result if you smoothly window the pulse, or maybe multiple periods of the pulse. You're effectively treating the input waveform as an FIR filter kernel and using it to filter an impulse train of arbitrary frequency.

In the sine's case you'd expect to just get some spectrum that converges to a sine at certain frequencies. It might not be a good example of a formant.

I dunno, maybe the "smoothness" could be adjustable. I think a smoother waveform would result in more pronounced peaks in the formant. Maybe that's useful.

I actually have a hardware synth (Wersi MK1/EX20) that does this. It just does static additive synthesis, but has a special formant mode that stretches or compresses the waveform. On a spectrogram you can see that it does produce a constant formant. But it just cuts the waveform off rather than overlap multiple copies, so the formant is distorted when transposing up. And it gets buzzy at low pitches, but that's exactly what you'd expect.

katjav

After all the technically detailed patent applications I was happy to read this interview with Fred Speckeen, who worked with Brian Gibson on the Digitech Vocalist series:

http://www.soundonsound.com/sos/1996_articles/aug96/ivl.html

A few exerpts:

"Our system can determine pitch in about two cycles of the signal, depending on the transients. No single technique works in all circumstances, so we use a hybrid approach that looks at a variety of signal features and statistics, before coming up with a pitch determination."

"The basic problem is that with traditional methods, you get the chipmunk effect, because as the pitch moves up, the head and body seem to get smaller. Our technology moves the pitch but keeps the body the same size. Because our goal was to make affordable products that produce the harmonies in real time, we didn't choose complex frequency-domain analysis/resynthesis methods. Instead, we turned to our pitch recognition expertise and implemented pitch-synchronous techniques that let us do all the processing in the time domain."

"With the Digitech Studio Vocalist we have added gender changing, where you can switch the apparent sex of the shifted voices.

The original technique is now over 20 years old. It's really time to have it in open source. Let's just dissect the patent text to the bone and reproduce it in C/Pd.

Katja

acreil

I guess Digitech's patents would cover the Whammy pedals also. They are very smooth and low latency for clean monophonic signals. And they also sound interesting when used on inappropriate signals (Ween's "Mister, Would You Please Help My Pony?").

Jon Dattorro details his design of the Lexicon 2400- similar idea but better suited to general purpose material.

https://ccrma.stanford.edu/~dattorro/machines.html

I'd always admired the designs of the "classics", i.e. H949 with "deglitch" board. I'd looked into building one in Pd, but it seemed impossible to just do it as an abstraction... so I'd love to see some solution to this. I'd be glad to do some reading and analysis, but I'm kinda busy for the moment.

katjav

I bought the 'cheap' Lexicon LXP5 reverb when it was just out, well that was still an expensive box for a hobby musician. It has a pitch shifter on board which became my favorite effect. Later came a Digitech GNX which can even do arpeggio's with help of it's pitch shifter. So cool.

It's great that we have fast computers now, Pure Data, and all those patents which are now expired. Even so, building a good pitch shifter will be a matter of many months or more, if we succeed at all.

Period-tracking is the hardest part of all. As it seems, this is mostly done by autocorrelation. I've experimented with autocorrelation via frequency domain, and soon found that it is efficient, but must probably be zero-padded to work well. Elden mentioned in an earlier post:

@elden said:

the necessary loop points are derived from the helix-calculation

How does that work? Is this a better method, replacing autocorrelation?

Katja

elden

@katjav said:

How does that work? Is this a better method, replacing autocorrelation?

Actually it's more a model than a calculation. It clearly shows the phase angles of a sound at certain positions in time respecting it's zero-crossings where the phase angle is zero, of course. with this visual information you can set the loop points at same phase angles which causes the loop to be smooth, even if you change the loop point's positions while playing back the looped cycle. what you create is a wavetable. the smoothest
amp-time-domain timestretching method known.
to get the right cycle length which is used to get the right distance between the loop points (cycle length = loop length) to fit at the exact phase angles, you surely would need a method of pitch tracking, too...

acreil

There's some useful stuff about this on the music-dsp mailing list

http://music.columbia.edu/pipermail/music-dsp/2006-May/065373.html
http://music.columbia.edu/pipermail/music-dsp/2010-December/069413.html

...etc.

elden

i just stumbled upon this dude, here: http://www.zynaptiq.com/pitchmap/overview/
What the hell?! polyphonic? realtime? in that quality? What??
The future is about neural network pattern recognition, it's obvious...

katjav

@elden said:

What the hell?! polyphonic? realtime? in that quality? What??

With 'realtime', some people mean that your computer may be fast enough to process a 1 minute file in 1 minute. Zynaptic advises to first try the pitchmap demo and see if your computer can handle the load.

Melodyne DNA also works 'realtime' in that sense. But it's not the same 'realtime' as in Pd.

Katja

elden

ok, but in contrast to melodyne it uses pattern recognition what might consume more cpu, but it's still that fast!
but what is more of interest is how it processes pitch shifting. Could be, it just uses usual pitch shifting, because it doesn't need to pitch very far from the original notes, but that might be too simple. Too bad I cannot check out the demo - it's Mac only.
I'm also interested in how it extracts mono notes from polyphonic input compared to Melodyne and if it solves the Cocktail Party Problem, too, if it uses neural networks...

Jwif

Do you have a link to the demo download elden? Apparently it's not released until tomorrow, otherwise I would have checked it out.