QUOTE (hyeewang @ Aug 16 2009, 22:21)

QUOTE (yneedshelp @ Aug 13 2009, 10:02)

Hello,
To obtain mfcc coefficients, we would do following:
(sound signal frame in time domain)->FFT->mel freq. scale filter->log->DCT
Our goal here is to extract a characteristic vector of the signal frame as reducing number of samples.
For example, when a number of samples in the frame is 400, taking FFT makes it 200. After mel freq. scale filtering, the number is, say, 40.
It seems to me that taking DCT does not compress data imformation at all, as number of the points after DCT is still 40...
Am I missing something here?
If DCT is for data compression, how can I get its effect?
If DCT is not for data compression, what is it doing?
Thanks!
Although u can get 40 mfcc coefficients, it is enough for only the leading 12 to store the signal essential characteristic.
In my humble opinion,that is where dct data compression propertity reside.
Sigh.
Some terms to look up
"transform gain"
"Diagonalization"
"matched filtering"
This may lead you both in the right direction.
To explain:
If we have a sine wave of maximum amplitude, you have something whose average amplitude (i.e. mean absolute value) is .5. While what I'm saying is not mathematically correct and is only approximate, that means that you have, say, for 16 bits, 15 bits on average per sample.
Now, if we take a 65536 point transform that happens to exactly match the sine wave in one particular basis vector (please look up transforms to see what a basis vector is!), you will have 65535 lines with ZERO information, and 1 line with an amplitude of 65536.
This gives you 16 bits above 1 (and 16 below), for a total of 32 bits in the signal representation, divided over 65536 samples.
That's hardly 15 bits per sample.
This is, of course, a massively extreme case of transform gain, in practice with windows, etc, you can not achieve this kind of gain. It does, however, explain the basic gain.
Basically log_2 (n) is a lot smaller than n for most n.