Is the MDCT good enough?

Topic: Is the MDCT good enough? (Read 24329 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Is the MDCT good enough?

2004-08-28 08:21:09

Today I heard someone say that the MDCT is imperfect by design, and thus all codecs that depend on it are all imperfect.

Quote

The Fourier transform allows perfect transformation from time domain to the frequency domain. However, the formula for the Fourier transform has elements of infinity inside, and cannot be used as is in an algorithm. Massive compromises were made, and a newer algorithm called the MDCT was born. This is as ugly as hell compared to the original Fourier transform.

The Fourier transform separates a PCM signal into sines and cosines, making a frequency spectrum. The number of sines and cosines that have to be dealt with are huge, so an Euler's equation is used to calculate the sines and the cosines as a whole through the exponents.

But the problem is, in Euler's equation goes something like exponent formula = cos(X) + i*sin(Y). i is the imaginary number, the square root of -1. You can't use an imaginary number in a real-life algorithm, so the sine part is discarded. Hence the Discrete Cosine Transform (DCT) was born. Notice how it fails to mention "Sine." Of course, a DST also exists.

To compensate for the sine-less DCT, modifications were made to the DCT, and one of the better schemes is the MDCT, which was adopted for lossy compression. How does it work? It splits one second of PCM data into 32 segments. And you apply the transform on each of the segments, and get 32 segments of frequency domain information. But you can't add them all up, because the start and end of the signal will be misaligned, so a window function is applied to make the ends weaker before adding.

Of course, with an ideal transform into the frequency domain, window fuctions should not exist!! The MDCT is an imperfect algorithm with many compromises, and you have to choose between time domain resolution and frequency domain resolution.

What I mean is, split 1 second into 100 parts and you can record more differences in the time domain. Split 1 second into 10 parts and the larger PCM segment will offer better frequency resolution. Vorbis can vary the split-size, as the need arises. There's a trade-off here, so Vorbis is making a compromise. As a side note, for tones higher than 500Hz, the human ear can identify amplitude in intervals of 10ms.

And for the Fourier transform to be applied at a single time, the cutting has to be done in even Hz boundaries. If you choose to cut in 1Hz segments, it has to be done from 20, 21, 22, ... , 20001, 20002Hz .. like this. Well, the original Fourier transform doesn't have such a limitation, but there's another faster Fourier transform algorithm which also makes compromises.

But the problem is, our ears don't perceive pitch like that. If a tone is at 200Hz, an octave higher than that will be 400Hz, and the one above it will be 800Hz. If the pitch is saved at even intervals, the disk-space for 20Hz~40Hz and 20000Hz~40000Hz will be the same.

And of course, the MDCT being not very ideal, it can only choose intervals in units of Hz. So MP3, WMA, and Vorbis all make petty compromises at 20000Hz. In an ideal case, the data size for saving up to 20000Hz will be only a little larger than the segment for saving up to 15000Hz. In the MDCT, it will have to be 4/3 times larger. Hence, MP3 and Vorbis are wasting enormous amounts of bits near high frequencies. That's the price you have to pay for making compromises with reality. Although when making compromises with reality, you have no choice..

Now that I've explained all this, you'll now know that the core algorithm, the MDCT, makes loads of compromises and is thus very far from ideal. If the frequency domain transformation is not perfect, then optimizing the acoustically imperceptible aspects won't be perfect either. Thus, from an academic point of view, lossy audio compression is very flawed. From a pragmatic point of view, it's not half bad. There have been efforts made to conceal these flaws, and if you take a listen, it's not so bad.

But it is far from perfect, and ruthlessly giving decoding/encoding time can get you better results.

What I want to say is, in the discussion on Vorbis vs. CD on the thread below, there have been opinions made advocating Vorbis based on psychoacoustic masking effects and such, but Vorbis is not perfect at all, and even throws out data that the human ear can perceive. That's a fact. There can be differences in degree, and it may get better at higher bitrates, but the gains from high bitrate Vorbis are not from better acoustic optimization, but rather from mindlessly throwing more bits at the data.

My conclusion is, Vorbis is imperfect, and though there may be differences in degree, it throws out perceptible data, thus it can't shake a stick at lossless PCM.

Oh, and one more thing. listening on high-end gear is definitely better than listening on low-end gear. The resolution or clarity speaks for itself.

Naturally, I was flabbergasted. Some of the more blatant flaws were pointed out at once, like "You have to listen to make a judgement on sound quality," or such, but he keeps on saying that the MDCT is very unattractive.

I don't think that's a fair statement. How should I repute this FUD?

edit: Or does he really have a point here? I hope not, but I want to know.

Is the MDCT good enough?

Reply #1 – 2004-08-28 09:46:34

OMG. The "original poster" says he was targetting me because I kept claiming that Vorbis is CD quality! I have never done such a thing in my life.

He's trolling, and he's trolling for me. I guess I'll just have to laugh it off.

Is the MDCT good enough?

Reply #2 – 2004-08-28 10:59:40

I can't elaborate much because I haven't had a look at Vorbis code but:
- if we don't take into account the fact that the MDCT samples are quantized (to various bit depths), then the MDCT is a mathematically reversible transformation, given one more block of zero samples at the beginning and the end of the file (because it's an overlapping transform).
- In the whole psymodel (which determines how much quantization can be allowed in MDCT samples), as far as I know, Vorbis uses only FFT's. So the MDCT has no influence there.

So, unless a Vorbis developer says otherwise, I think there's no problem at all

Thus we can safely assume that guy doesn't know what he's talking about..

Is the MDCT good enough?

Reply #3 – 2004-08-28 13:55:02

Quote

You can't use an imaginary number in a real-life algorithm, so the sine part is discarded

I don't see why we couldn't use an imaginary number in a real-life algorithm !!
I did it in eighth grade in order to draw a fractal on a computer that must have been a 386 or 486 CPU.

Is the MDCT good enough?

Reply #4 – 2004-08-28 14:12:41

Heh. What happens when there *are* inadequacies in the MDCT transformation? I know that a low-accuracy IMDCT in Vorbis playback can have problems with low level noise if the dynamic range is very wide.

Is the MDCT good enough?

Reply #5 – 2004-08-28 14:17:50

Quote

Quote
You can't use an imaginary number in a real-life algorithm, so the sine part is discarded

I don't see why we couldn't use an imaginary number in a real-life algorithm !!
I did it in eighth grade in order to draw a fractal on a computer that must have been a 386 or 486 CPU.
[a href="index.php?act=findpost&pid=237627"][{POST_SNAPBACK}][/a]

I really don't understand all these argument on the imperfection of the MDCT.. Even FFT isn't perfect.. in terms of aliasing etc-etc..

Is the MDCT good enough?

Reply #6 – 2004-08-28 14:18:53

Quote

Heh. What happens when there *are* inadequacies in the MDCT transformation? I know that a low-accuracy IMDCT in Vorbis playback can have problems with low level noise if the dynamic range is very wide.
[a href="index.php?act=findpost&pid=237630"][{POST_SNAPBACK}][/a]

It will get noisier than it should, which may drown the signal.

(You could replace MDCT by anything in that sentence and It'd still be valid - what you say has nothing to do with the MDCT itself basically, let alone with the troll that the first post was!)

Is the MDCT good enough?

Reply #7 – 2004-08-28 14:21:01

Quote

Quote
Quote
You can't use an imaginary number in a real-life algorithm, so the sine part is discarded

I don't see why we couldn't use an imaginary number in a real-life algorithm !!
I did it in eighth grade in order to draw a fractal on a computer that must have been a 386 or 486 CPU.
[a href="index.php?act=findpost&pid=237627"][{POST_SNAPBACK}][/a]

I really don't understand all these argument on the imperfection of the MDCT.. Even FFT isn't perfect.. in terms of aliasing etc-etc..[a href="index.php?act=findpost&pid=237631"][{POST_SNAPBACK}][/a]

The logic he's using in his follow-up comments goes something like this: The MDCT is flawed, but the IMDCT covers that. The problem is, the psychoacustic compression kicks in before the IMDCT, and the IMDCT (edit: and lossy compression) works on flawed data, giving flawed results.

edit: grammar

Is the MDCT good enough?

Reply #8 – 2004-08-28 16:49:42

Quote

The logic he's using in his follow-up comments goes something like this: The MDCT is flawed, but the IMDCT covers that. The problem is, the psychoacustic compression kicks in before the IMDCT, and the IMDCT (edit: and lossy compression) works on flawed data, giving flawed results.

Well, his assumption is completely flawed.

The encoder compares:
" original signal "

with:
" original signal -> MDCT -> quantization -> IMDCT "

If the quantization is non-existent, then the signal is restored perfectly.

The amount of quantization is DRIVEN BY the quality in the decoded signal. Since the MDCT is a reversible overlapping transform, the encoder can knowingly achiever whatever quality level is requested. Thus his argument is crap.

In other words, the encoder is perfectly "IMDCT-aware" and compares the distorsion introduced by the process, AT ALL TIMES. The maximum allowed distorsion is computed for each frequency band, by the psymodel. The psymodel DOES NOT RELY on the MDCT to do its work, it relies on FFT, filters and time-domain analysis, so it is a non-issue.

Is the MDCT good enough?

Reply #9 – 2004-08-28 19:12:25

This is somewhat off-topic, but why is the MDCT used instead of (M?)FFT? Aren't the psychoacoustics generally done with the FFT anyway?

The idea that "you can't use imaginary numbers in real-life" is clearly bogus, but I've been curious.

Thanks

Is the MDCT good enough?

Reply #10 – 2004-08-28 20:40:34

If anyone cares about my opinion:

He has some knowledge but not enough to say "the (i)MDCT is imperfect for coding signals"

... whatever his definition of perfect/imperfect is ...

As for FFT versus MDCT:
The point is: The (i)MDCT serves its purpose better than any other transform he mentioned.
- no discontinuities (block artifacts)
- critically sampled low delay filterbank
- good energy compaction (for steady signal parts)

Anything else I had to say would go into the same direction as what NumLOCK has already said - so, I'll just leave it.

(I've the slight feeling that my grammar sucks in this post, sorry for this)

Sebastian

Is the MDCT good enough?

Reply #11 – 2004-08-29 03:05:39

Quote

Quote
You can't use an imaginary number in a real-life algorithm, so the sine part is discarded

I don't see why we couldn't use an imaginary number in a real-life algorithm !!
I did it in eighth grade in order to draw a fractal on a computer that must have been a 386 or 486 CPU.
[a href="index.php?act=findpost&pid=237627"][{POST_SNAPBACK}][/a]

Plus imaginary numbers, used in the context of phasors, can be used for representing components which have 90 degrees phase lag.

As for the FFT versus DCT, an N-point FFT assumes a period of N which means that for real-life signals (lowpass), there will be a discontinuity due to the difference in magnitude and this costs in terms of energy spread, while the DCT assumes a period of 2N, thus it is like a mirror reflection and is smoother. Of course, if it is a high-pass signal, the DST is the beter transform, but since we are talking about real-life signals, most of them are low-pass ones so DCT will always have better energy compaction than the FFT.

I'm surprised the original poster didnt just answer with a one liner, since obviously he is talking about objective quality. "Lossless PCM has infinite SNR while Vorbis has xx.xx dB....thus PCM is obviously so much better"

Is the MDCT good enough?

Reply #12 – 2004-08-29 07:41:46

OK, thanks everyone for helping..

Does the Hamming window function blunt the peaks?

edit: http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-intro.html are these Hamming windows?

Is the MDCT good enough?

Reply #13 – 2004-08-29 08:44:55

Quote

http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-intro.html are these Hamming windows?
[a href="index.php?act=findpost&pid=237810"][{POST_SNAPBACK}][/a]

No, those are sine windows.

Is the MDCT good enough?

Reply #14 – 2004-08-29 08:55:13

Quote

OK, thanks everyone for helping..

Does the Hamming window function blunt the peaks?

edit: http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-intro.html are these Hamming windows?
[a href="index.php?act=findpost&pid=237810"][{POST_SNAPBACK}][/a]

No, I believe Vorbis uses it's own special window function (sine based). I saw the equation for it once, but I forget it. Somebody else could eleborate. ;-D

Is the MDCT good enough?

Reply #15 – 2004-08-29 08:57:25

Actually, they're Vorbis windows Vorbis has it's own windowing function which is different from a normal sine window.

Is the MDCT good enough?

Reply #16 – 2004-08-29 13:30:25

Quote

Quote
Quote
You can't use an imaginary number in a real-life algorithm, so the sine part is discarded

I don't see why we couldn't use an imaginary number in a real-life algorithm !!
I did it in eighth grade in order to draw a fractal on a computer that must have been a 386 or 486 CPU.
[a href="index.php?act=findpost&pid=237627"][{POST_SNAPBACK}][/a]

Plus imaginary numbers, used in the context of phasors, can be used for representing components which have 90 degrees phase lag.

As for the FFT versus DCT, an N-point FFT assumes a period of N which means that for real-life signals (lowpass), there will be a discontinuity due to the difference in magnitude and this costs in terms of energy spread, while the DCT assumes a period of 2N, thus it is like a mirror reflection and is smoother. Of course, if it is a high-pass signal, the DST is the beter transform, but since we are talking about real-life signals, most of them are low-pass ones so DCT will always have better energy compaction than the FFT.

I'm surprised the original poster didnt just answer with a one liner, since obviously he is talking about objective quality. "Lossless PCM has infinite SNR while Vorbis has xx.xx dB....thus PCM is obviously so much better"
[a href="index.php?act=findpost&pid=237783"][{POST_SNAPBACK}][/a]

I think there are some differences between the DCT and the MDCT..

wkwai

Is the MDCT good enough?

Reply #17 – 2004-08-29 18:30:11

Quote

I think there are some differences between the DCT and the MDCT..
[{POST_SNAPBACK}][/a]

Sure there are!

The DCT we all know is actually called "DCT type 2/3" (forward/inverse).

The MDCT is a concatenation of some butterflies across block boundaries which are derived from the window functions and a type 4 DCT. The cool thing about a type 4 DCT is: Its matrix is symmetrical (and orthogonal) and thus its own inverse.

If you extend the basis vectors of the type-4-DCT you'll get a similar "mirror-effect" like in type-2/3-DCTs , just inverted on one side.

[a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=20449](i)MDCT Example for 3 "blocks" with the blocksize n=8[/url]

Sebastian

Is the MDCT good enough?

Yes, it is clear, but I still do not agree

Let's take the example of 2 stationary tones, closely spaced in frequency. With a short-length MDCT, i.e. a filterbank of low-frequency resolution, I will not be able to distinguish the 2 tones, they are both jointly represented with only 1 coefficient. A psymodel will tell me that due to masking I can and should quantize one of the tones coarser, but my transform cannot separate the 2 tones. This I call lack of accuracy.

Let's take the example of 2 dirac impulses, closely spaced in time. With a long-length MDCT, i.e. a filterbank of high-frequency resolution, I will not be able to distinguish the 2 impulses, they are both represented with M coefficients. A psymodel will tell me that due to temporal masking I should quantize the impulses separately, but my transform cannot separate the 2 impulses. This I also call lack of accuracy.
[a href="index.php?act=findpost&pid=238083"][{POST_SNAPBACK}][/a]

*NOWHERE IN THIS DISCUSSION* has anyone said *ANYTHING* about using MDCT for *PSYMODELS*, in fact it was stated several times that for psymodels FFT is generally preferred.

Psymodels with MDCT are possible but *do* have issues, some of which you mention. This is why they aren't used a lot and not by Vorbis at all.