Sure. It's really about dealing with complex numbers.
The outputs of [fft~] are the real and imaginary parts of the spectrum. It is the complex number in cartesian form.
z = x + jy
Cartesian math is less computationally expensive, but it doesn't necessarily give you the information you're looking for on its own. Converting to polar form can give more intuitive information. Polar form looks like this:
z = A*e^(jθ)
Here, A is the magnitude (or amplitude) and θ is the angle (or phase). Converting from polar to cartesian is pretty straight forward:
A*e^(jθ) -> A*cos(θ) + j*A*sin(θ)
Going the other way is trickier. The whole "squared, added, and square-rooted" thing is to find A. The way it works is by plotting the cartesian form as a point on the cartesian plane, so that the real part is the x-axis and the imaginary part is the y-axis. The distance between the origin and that point is A, and finding that distance is a matter solving that Pythagorean theorem for right triangles that we all thought was useless in high school:
A^2 = x^2 + y^2
I assume it doesn't do that stuff automatically because it keeps it more general and less computationally expensive. You don't have to go through square roots and atan2 (which is needed find the phase) by sticking to cartesian form. And [rfft~] doesn't normalize automatically because normalization is window-dependent.