Help - Search - Members - Calendar
Full Version: FilterBank in MP3 and AAC
Hydrogenaudio Forums > Lossy Audio Compression > AAC > AAC - General
kennyzero
We know that MP3 uses QMF filterbank and then an MDCT filter, whereas AAC uses only MDCT filter.
And we know that the quality of AAC is better than MP3 at the same bitrate.

Why is this true ? Thx. blink.gif
Garf
The addition of the filterbank in MP3 was more due to hardware requirements and beaurocracy than technical reasons.
kennyzero
QUOTE(Garf @ Jun 27 2003 - 01:27 PM)
The addition of the filterbank in MP3 was more due to hardware requirements and beaurocracy than technical reasons.

Then how come many people said that the filterbank used in AAC is better than that in MP3?

MP3 runs the QMF then MDCT, AAC runs MDCT only....the only difference is complexity? blink.gif
rjamorim
In this case, clearly, MDCT alone is better than QMF+MDCT

QMF was added to MP3 mostly for political reasons, IE, to make Phillips happier.
kennyzero
QUOTE(rjamorim @ Jun 29 2003 - 06:42 PM)
In this case, clearly, MDCT alone is better than QMF+MDCT

QMF was added to MP3 mostly for political reasons, IE, to make Phillips happier.

Thanks

Yet, I still don't understand the reason why QMF added is not good when compared to MDCT alone. Could anyone explain further? Thanks much. rolleyes.gif
tangent
Quadrature Mirror Filterbank (QMF) is used also in MP2 to break the wideband signal into 32 subbands of equal bandwidths. In MP3, this is extended by further breaking each of the subbands down with an 18-point MDCT, and by concatenating all the transforms of the 32 subbands together you get a 1184 point spectral signal. That's also one of the reason why MP3 uses hybrid QMF-MDCT, as an extra layer on top of audio layer 2. Vorbis and AAC does a straight 1024 or 2048 point MDCT on the wideband signal. QMF is not a perfectly reversible function, therefore you cannot get back the exact same original signal with an inverse function, while this is possible with iMDCT.
wkwai
Also, you probably noticed that there are some interband spectral aliasing in the MP3 PQF filterband. As a result there is a need to compensate for it in the frequency domain.

Then by splitting the time domain signal into 32 band time domain signal, before transforming to frequency domain using a far shorter MDCT window, the spectrals are not so compact as in AAC which uses a much longer MDCT window. As a result the coding efficiency of the MP3 filter bank is about 30% less than in AAC. You probably noticed that AAC works at bitrates of 96kbps whereas MP3 at 128 kbps.

However the structure of a single block of MDCT in AAC prohibits the implementation of a scaleable decoder.. It is possible to implement a scalable decoder for MP3 because of the PQF filterbank structure.

This issue was addressed in MPEG4 AAC which has a version that uses the Sony Gain-Control 4 band PQF structure.
kennyzero
QUOTE(tangent @ Jun 29 2003 - 10:35 PM)
Quadrature Mirror Filterbank (QMF) is used also in MP2 to break the wideband signal into 32 subbands of equal bandwidths. In MP3, this is extended by further breaking each of the subbands down with an 18-point MDCT, and by concatenating all the transforms of the 32 subbands together you get a 1184 point spectral signal. That's also one of the reason why MP3 uses hybrid QMF-MDCT, as an extra layer on top of audio layer 2. Vorbis and AAC does a straight 1024 or 2048 point MDCT on the wideband signal. QMF is not a perfectly reversible function, therefore you cannot get back the exact same original signal with an inverse function, while this is possible with iMDCT.

laugh.gif
thx much

another question...how come concatenation creates a 1184-point spectral signal?
kennyzero
QUOTE(wkwai @ Jun 30 2003 - 04:51 AM)
Also, you probably noticed that there are some interband spectral aliasing in the MP3 PQF filterband. As a result there is a need to compensate for it in the frequency domain.

Then by splitting the time domain signal into 32 band time domain signal,  before transforming to frequency domain using a far shorter MDCT window, the spectrals are not so compact as  in AAC which uses a much longer MDCT window. As a result the coding efficiency of the MP3 filter bank is about 30% less than in AAC. You probably noticed that AAC works at bitrates of 96kbps whereas MP3 at 128 kbps.

However the structure of a single block of MDCT in AAC prohibits the implementation of a scaleable decoder.. It is possible to implement a scalable decoder for MP3 because of the PQF filterbank structure.

This issue was addressed in MPEG4 AAC which has a version that uses the Sony Gain-Control 4 band PQF structure.

laugh.gif
thx much

Sony Gain-Control 4 band PQF structure = AAC-SSR Profile?
tangent
QUOTE(kennyzero @ Jul 2 2003 - 09:54 AM)
another question...how come concatenation creates a 1184-point spectral signal?

Should be 1152, sorry.
wkwai
QUOTE(kennyzero @ Jul 1 2003 - 05:56 PM)
Sony Gain-Control 4 band PQF structure = AAC-SSR Profile?

Yes, that is the MPEG4 AAC- SSR Profile..
wkwai
QUOTE(kennyzero @ Jul 1 2003 - 05:54 PM)
another question...how come concatenation creates a 1184-point spectral signal?

For long block MP3 filterbanks produces a 576 point spectral signal not 1152!
AAC on the other hand has 1024 point spectral signal, almost twice that of MP3.
JohnV
QUOTE(tangent @ Jul 2 2003 - 08:19 AM)
QUOTE(kennyzero @ Jul 2 2003 - 09:54 AM)
another question...how come concatenation creates a 1184-point spectral signal?

Should be 1152, sorry.

MP3 has 1152 samples per frame, but as wkwai said 576 spectral lines for long block.
kennyzero
QUOTE(JohnV @ Jul 2 2003 - 01:19 AM)
QUOTE(tangent @ Jul 2 2003 - 08:19 AM)
QUOTE(kennyzero @ Jul 2 2003 - 09:54 AM)
another question...how come concatenation creates a 1184-point spectral signal?

Should be 1152, sorry.

MP3 has 1152 samples per frame, but as wkwai said 576 spectral lines for long block.

I think I got what you mean...so let me clarify something, please tell me whether I am right, thx
"1152 samples per frame" is in time domain, "576 spectral lines for long block" is in frequency domain. That also implies in each frame, we have 1152*576 spectral lines.

How about AAC? I just know it has 1024 spectral lines for long block. Samples per frame also 1024? huh.gif
kennyzero
Actually I want to know the details about the applying of filterbank to input signals for MP3 and AAC.

My friend told me that: in time domain, every time the window will shift 32 samples. The entire window length will divide into 32 subbands. Summation of all signal energies in each subbands are calculated and for SNR computation later.

On the other hand, the input signal is applied the 1024-point FFT to have a more accurate frequency domain representation of signals.

The SNR and frequency domain representation will be used for masking calculation later.

Are they correct? I am still quite confused.
Is there any "animation" kind of explanation? laugh.gif That can give me a clearer picture.
ErikS
QUOTE(kennyzero @ Jul 3 2003 - 02:09 AM)
How about AAC? I just know it has 1024 spectral lines for long block. Samples per frame also 1024?  huh.gif

MDCT being a critically sampled and 50% overlapping transform I'd expect the number of samples for each block in the time domain to be twice as many as the spectral lines for it in the frequency domain.
kennyzero
QUOTE(ErikS @ Jul 2 2003 - 07:33 PM)
QUOTE(kennyzero @ Jul 3 2003 - 02:09 AM)
How about AAC? I just know it has 1024 spectral lines for long block. Samples per frame also 1024?  huh.gif

MDCT being a critically sampled and 50% overlapping transform I'd expect the number of samples for each block in the time domain to be twice as many as the spectral lines for it in the frequency domain.

what does that mean?
i heard that in mp3, 36 samples are the input to the transform, on the other hand we will get 18 samples
is that the case? how is that in aac?
ErikS
QUOTE(kennyzero @ Jul 3 2003 - 05:08 AM)
what does that mean?
i heard that in mp3, 36 samples are the input to the transform, on the other hand we will get 18 samples
is that the case? how is that in aac?

Critically sampled means that when you sum up the number of all output samples it will equal the number of input samples. With 50% overlap the window for selecting input samples will be moved only 50% of its length for every block. It means that every sample will affect exactly two blocks (one following right after the other). If the transform would give as many samples out as it takes in it would mean that the total number of samples in the frequency domain would be twice as many as you had in the time domain. So in order to make it critically sampled each round of the transform must only give back half as many output samples as input samples. Hope that helps.
wkwai
QUOTE(kennyzero @ Jul 2 2003 - 05:09 PM)

How about AAC? I just know it has 1024 spectral lines for long block. Samples per frame also 1024?

AAC has 2048 time domain samples per frame to produce 1024 spectral lines.
wkwai
[quote=kennyzero,Jul 2 2003 - 08:08 PM] [/QUOTE]what does that mean?
i heard that in mp3, 36 samples are the input to the transform, on the other hand we will get 18 samples
is that the case? how is that in aac? [/quote]
In Mp3, after splitting the time domain audio samples into 32 band time domain audio samples, in each band, a time domain sample length of 36 is constructed and transformed to the frequency domain using the MDCT analysis to produce 18 frequency points. So there are 32 separate MDCT windows for the entire frame. 32 * 18 will give you 576 spectral points for the entire frame.

At the same time a 1024 FFT is calculated in the Psychoacoustic Model to approximate the masking threshold of the MDCT coefficients.

AAC however uses a single 2048 MDCT window to produce 1024 spectral points.

ohmy.gif
kennyzero
QUOTE(wkwai @ Jul 3 2003, 02:43 AM)
At the same time a 1024 FFT is calculated in the Psychoacoustic Model to approximate the masking threshold of the MDCT coefficients.

AAC however uses a single 2048 MDCT window to produce 1024 spectral points.

I wonder this for quite some time: MDCT gets 2048 inputs to compute 1024 spectral lines for long blocks; gets 256 inputs to compute 128 lines for short blocks in AAC (pls correct my mistakes, if any...)

FFT gets 1024 and gives 1024? dry.gif
ErikS
QUOTE(kennyzero @ Jul 4 2003, 03:00 AM)
FFT gets 1024 and gives 1024?  dry.gif

Yes.
Ivan Dimkovic
No, FFT also gets 2048 samples to output 1024 frequency coefficients.

In AAC this could be done by "centering" the FFT window so first 1024 samples are from the last frame and last 1024 samples are from the current frame.
ErikS
QUOTE(Ivan Dimkovic @ Jul 4 2003, 07:58 AM)
No, FFT also gets 2048 samples to output 1024 frequency coefficients.

blink.gif A special FFT in AAC?

Edit: Hmm. Input samples are real, so you get only one half of the resulting coefficients back?
wkwai
Yes, only the first half of the FFT analysis results have meaning. The 2nd half results is ignored as it is just a mirror image of the 1st half.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.