QUOTE(xiphmont @ Apr 11 2006, 06:46 AM)

Nevertheless, noise introduced by spectral quantization is not perceived as noise defined in the colloquial sense.
It depends. If you add white gaussian noise in the frequency domain it'll be no different to adding white gaussian noise in the time domain -- no difference at all. But why are we experiencing those weird metallic sounding artefacts when we do quantization in the freq domain? (me being rhetorical) Given that the MDCT (on the whole signal) is an orthogonal mapping, the quantization error you do in the frequency domain just gets "rotated" in the inverse transform. Why should it sound differently compared to quantizing in the time domain? (again rhetorical)
Things get clearer if we really imagine a high dimensional space where every point in the space is one possible "block of audio". By quantizing each component linearly with a constant quantizer stepsize we map each point to another point on an an orthogonal grid. So, when you say it sound's differently when done in the frequency domain compared to quantizing in the time domain it must have something to do with the orientation of the grid, no? Because this is the only thing that changes by applying an orthogonal transform.
At this point I'd like to include dithering. Why do we do it (in a DSP kind of sense) again? To keep the error's samples decorrelated! (Yes, we can do stuff like noise shaping too but dithering is required nonetheless to be able to guarantee a certain degree independance of the error to the signal).
My proposition: If we do quantize a signal with proper dithering it does not matter at all in what domain we do it, it'll sound the same (real noise, no metallic sounding rubbish) on condition that we try to keep a similar noise power distibution in the time/frequency space by applying noise shaping.
Justification: Let x be a random vector (each component is a random variable
independant from each other with the
same standard deviation sigma). Suppose we do this mapping Ax=y where A is an
orthonormal matrix. What can we say about y's components' distribuions?
1) Each component is a random variable with the same standard deviation sigma
2) There's no correlation between components in y (there might be a dependance though, in case of non-gaussian distributions).
=> We get uncorrelated white noise with a constant per-sample standard deviation in the time domain. YAY!
This is just a case for a flat noise distribution in the time/frequency space but it extends to other more practical cases as well.
So, if we quantize in the frequency domain with dithering properly applied, we're doing fine. BTW: I don't consider "noise normalization" to be dithering in the usual DSP sense. NN just tries to keep the energy of the quantized signal close to the original signal's energy whereas dithering is mainly supposed to avoid nonlinear artefacts. NN helps avoiding these a bit too but not completely.
Now, the experienced reader might argue about dithering not being helpful when it comes to compression since it slightly increases entropy. But this only applies to additive dithering. So, If I had to design the next codec I'd seriosly consider a subtractive dither. Subtractive dithering + quantization can be done simultaneously by switching randomly between different quantizers with the same step size but other offsets.
Example:
q1 = { ... -19, -15, -11, -7, -3, 1, 5, 9, 13, 17 ... }
q2 = { ... -17, -13, -9, -5, -1, 3, 7, 11, 15, 19 ... }
The quantizer (q1 or q2) will now be selected "randomly" for the current sample. A simple PRNG will do. Encoder and decoder need to know the exact state of this PRNG of course. The beauty of this example is the following: Notice that q1 = -q2. We can order the elements of q1 and q2 in ascending absolute values and assign an index to each of those starting at zero:
q1 = { q10= 1, q11=-3, q12= 5, q13=-7, q14= 9, q15=-11 ... }
q2 = { q20=-1, q21= 3, q22=-5, q23= 7, q24=-9, q25= 11 ... }
Assuming symmetrical sample distributions around zero this enables us to use the same entropy coder for both quantizers since Prob(sample=q1i) = Prob(sample=q2i) for all i. So, all the indices can be grouped and coded using a single huffman code table for example.
Hey, we might even want to go one step further. How about Trellis-coded-quantization? (Switching quantizers not only dependant on the output of the PRNG but also dependant on the previously quantized sample) This allows some clever rate/distortion optimization using the Viterbi algorithm. Benefits: Better rate/distortion ratio (approaching VQ) while still being simple to encode and decode.
Oh boy! It took some time writing this.

Sebi