It's probably good to have this documented somewhere:
Before I walk through it I should mention that the method alexandros mentions is very common and efficient: store the waveform in a table, and read from the table with an interpolation scheme. Lagrange interpolation, used in tabread4, is a bit less expensive than the other scheme commonly used in audio, hermite spline. The way to accomplish this is simply have a float or a double as an accumulator that is incremented based on the frequency of the oscillator, and make it wrap to stay within the table it is reading from. You use the fractional index of this value to interpolate. Check out katja's pd-double file for an example
https://github.com/pd-projects/pd-double/blob/master/src/d_osc.c
osc~ in normal pd uses the same thing, but with some bit trickery and only linear interpolation.
I think the reason for the Hoeldrich method is to avoid branch prediction/piplelining stuff when the index wraps but really I have no clue...
I spent a long time (multiple days) figuring d_osc.c out about a 1/2 year ago:
The thing to do in order to understand the guts is to constantly think of the binary representation of the numbers used. This page may be helpful in this endeavor:
http://www.binaryconvert.com/convert_double.html
The first thing to understand is the properties of UNITBIT32. The file says it is a "power of 3^19 such that bit 32 has place value 1". 3 is 1.1 X 2 in binary, because the point shifts 1 place to the right when multiplying by 2 (in floating point, remember that 1.1 is 1 and 1/2). this number is therefore 1.1 X 2^20.
In double precision floating-point, there are 52 significand bits, 11 exponent bits, and 1 sign bit. binary numbers in exponential notation are always in the form: 1.(something-series-of-0s-and-1s) * (2^exponent)
. Because the 1 to the left of the decimal (or binary rather?) point is always there, it is not stored in floating point representation and assumed to exist always. The only case where it does not is when the number is 0, which floating-point handles specially.
When floating-point numbers are used, normally the exponent changes whenever the precision calls for another power of 2 to be used. For instance, if you have the number 3 and then multiply by, say, 4 the significand does not change, only the exponent.
What UNITBIT32 does is keep a certain precision when numbers are added or subtracted. There are always 32 bits representing the fractional part of the number. Why? because: the significand is 52 bits, and the number 1.1 X 2^20 takes exactly 20 of the bits for the non-fractional part, leaving 32 bits for the fractional.
tabfudge is obviously a union of 2 32-bit unsigned ints and a double, it's purpose is to set specific bits of a double
number using ints.
d_osc.c exploits this in the code for phasor~ by setting the upper bits of the number (tf.tf_i[HIOFFSET]
) to the constant normhipart
, every sample, which is just the 32 bits on the "left" of UNITBIT32. So every sample, those bits are set to be a certain value. Then, the resulting double precision number is incremented or decremented by the appropriate amount according to frequency, and afterwards normhipart is subtracted back out. The effect of this is to always keep only the fractional part, the 32 bits to the right of the decimal point. (because everything else is wiped clean every sample)
I have no idea why 1.1 X 2^20 was used instead of 1 X 2^20, maybe because it's more efficient to not have the exponent change as much when decrementing?
So now that the properties of UNITBIT32 and the union have been realized, on to the explanation of osc_perform, I'm going to do the non-unwrapped version for simplicity, (it's just like the wrapped version in a different order, for some weird optimization reasons I think):
static t_int *osc_perform(t_int *w)
{
t_osc *x = (t_osc *)(w[1]);
t_float *in = (t_float *)(w[2]);
t_float *out = (t_float *)(w[3]);
int n = (int)(w[4]);
float *tab = cos_table, *addr, f1, f2, frac;
double dphase = x->x_phase + UNITBIT32;
int normhipart;
union tabfudge tf;
float conv = x->x_conv;
tf.tf_d = UNITBIT32;
normhipart = tf.tf_i[HIOFFSET];
most of this is explained above or self-explanatory, HIOFFSET is for the difference in endianness on different machines, tf.tf_i[HIOFFSET] are the highest bits. dphase is set to the current phase + UNITBIT32, and then tf.tf_d, the "manipulated" double number, is set to UNITBIT32 The last 2 lines are just to set normhipart.
#if 0
while (n--)
{
tf.tf_d = dphase;
dphase += *in++ * conv;
dphase is advanced by an appropriate amount of samples in relation to the size of the cosine table based on the input frequency. Note that dphase has UNITBIT32 plus x_phase, which sets the precision to have the fractional part in the right 32 bits.
addr = tab + (tf.tf_i[HIOFFSET] & (COSTABSIZE-1));
the integer part of the index is taken modulo COSTABSIZE with masking and the address is obtained. This isn't a problem because COSTABSIZE is less than 20 bits (the amount of bits available in the significand in the highest 32 bits of the manipulated double.)
tf.tf_i[HIOFFSET] = normhipart;
the integer part is erased, and set to the high bits of UNITBIT32
frac = tf.tf_d - UNITBIT32;
f1 = addr[0];
f2 = addr[1];
*out++ = f1 + frac * (f2 - f1);
}
#endif
the fractional part is obtained by subtracting UNITBIT32, and the result is interpolated. The next section is to get back to the stored phase x_phase
which is the table location to save.
tf.tf_d = UNITBIT32 * COSTABSIZE;
normhipart = tf.tf_i[HIOFFSET];
because COSTABSIZE is a power of 2, this is just like the other normhipart but with a different exponent value (the lower 32 bits are not the fractional part of the corresponding double anymore though, now the fraction is moved over to the right by log2(COSTABSIZE) in relation). now everything that won't fit within the domain of COSTABSIZE will be in the HIOFFSET part of this number.
tf.tf_d = dphase + (UNITBIT32 * COSTABSIZE - UNITBIT32);
Remember that when dphase
was created it was assigned to x_phase
plus UNITBIT32. so with this included the line simplifies to x_phase + UNITIBIT32*COSTABSIZE
, where x_phase
has been incremented every sample. That's my best guess of why the "- UNITBIT32" part is there
tf.tf_i[HIOFFSET] = normhipart;
x->x_phase = tf.tf_d - UNITBIT32 * COSTABSIZE;
return (w+5);
}
everything that didn't fit into COSTABSIZE was put into HIOFFSET, and now it is set to normhipart in the exact same way the non-fractional part of the read point was erased. Then UNITBIT32*COSTABSIZE
is subtracted to get the saved read point in the same way the fractional part was extracted earlier.
I still don't understand a lot in this, like what the unwrapping does..