Help - Search - Members - Calendar
Full Version: filterbank and mdct
Hydrogenaudio Forums > Lossy Audio Compression > AAC > AAC - Tech
TaichiOrange
I am reading AAC spec. I have a question: as I know , filterbank is used to split the time-domain signal into subband time-domain signal. MDCT is the time-frequecy converter. But why does AAC spec describe filterbank as MDCT? In the filterbank chapter, it only explains how to window, how to do MDCT, no any filterbank description.

Thanks.
SebastianG
An MDCT gives you one transform coefficient per subband per block. The outcome of the MDCT is a signal in the frequency domain that corresponds to a certain time slot. But if you consider every n-th coefficient of ALL MDCTs (over all blocks) you have a time-domain signal corresponding to the n-th subband. A critically sampled filterbank trades frequency resolution with time resolution.
TaichiOrange
Thanks for your reply.

QUOTE(SebastianG @ Sep 14 2006, 14:58) *

An MDCT gives you one transform coefficient per subband per block. The outcome of the MDCT is a signal in the frequency domain that corresponds to a certain time slot.

But as the spec, one long block type frame only has one MDCT transform, no obvious subband. I have read the MP3 spec, MP3 has 32 subbands for every frame.



QUOTE

But if you consider every n-th coefficient of ALL MDCTs (over all blocks) you have a time-domain signal corresponding to the n-th subband.

Could you give more detailed explaination for this? thanks.
SebastianG
QUOTE(TaichiOrange @ Sep 14 2006, 10:32) *

Could you give more detailed explaination for this? thanks.

2nd more elaborate try ...

The MDCT operates on short blocks (subsets of time). After the transform (of one block!!) the result can be viewed as a signal in the frequency domain that corresponds to that one block. The first coefficient corresponds to the first subband, the 2nd coeff corresponds to the 2nd subband and so on. But your audio signal is composed out of more than one of these blocks. So, for each block you get one subband sample for all subbands. Now, you can arrage your transform coefficients into a 2D Array like
CODE

coeffs[block aka time][subband aka frequency]

A time-domain representation of a signal is given by a function that maps time to something
A frequency-domain representation of a signal is given by a function that maps frequency to something

So, the coeffs for a complete song you get are neither a frequency-domain (1 time slot, many subbands) representation nor a time-domain (1 subband, many time slots) representation. They span both domains. However, a subset of your transform coefficients can be viewed as a time- or frequency-domain signal. If you just pick out all the coeffs of transform block t=5 you get a frequency-domain representation of that one block because all these coeffs can be addressed by a variable that corresponds to the frequency. If you just pick out all coeffs that correspond to subband f=9 you get a time-domain representation of that one subband because all these coeffs can be addressed by a variable that corresponds to the time.

There!
TaichiOrange
Thanks. I see.

I am confused by the MPEG1 spec whose Layer III has a 32-subband filterbank, each subband has 18 samples. Its encoder flow is :

one frame samples ---> filter bank --->18 samples per subband ---> MDCT.

It is to say, one subband has 18 frequency lines. In AAC, one subband has only one frequency lines.


Gabriel
You can consider that mp3 features two cascaded filterbanks.

Sebastian: isn't it a little tricky to call mdct a filterbank? A filterbank should be doable with a bank of filters, shouldn't it?
QMF are obviously filterbanks, but should we call all time/space->freq transforms like mdct/fft/hct/hadamard filterbanks?
SebastianG
QUOTE(TaichiOrange @ Sep 15 2006, 05:42) *

It is to say, one subband has 18 frequency lines. In AAC, one subband has only one frequency lines.

MP3: Like Gabriel said: First stage splits the signal into 32 subbands. 2nd stage splits each subband into 18 subbands => 576 subbands with one sample each for one granule.
AAC: Only one stage. Splitting into 1024 subbands. (for long blocks)

Since the subbands are very narrow we like to speak of "spectral lines" or "frequency lines".

QUOTE(Gabriel @ Sep 15 2006, 07:53) *

isn't it a little tricky to call mdct a filterbank? A filterbank should be doable with a bank of filters, shouldn't it?
QMF are obviously filterbanks, but should we call all time/space->freq transforms like mdct/fft/hct/hadamard filterbanks?

Why not? These are types of critically sampled filterbanks. Take the iMDCT for example: if you set one transform coefficient to 1 and the rest to 0, compute the inverse transform you get the impulse response of the synthesis filter for that specific subband whose sample you set to 1. Assuming equal analysis and synthesis windows which corresponds to an orthogonal filterbank, the analysis filter's impulse response is the time-reversed version of the synthesis impulse response.

But nobody should use other transforms (like the FFT) as a filterbank because their analysis/synthesis filters suck (energy leakage due to not having a smooth window / no overlap).
Gabriel
QUOTE(SebastianG @ Sep 15 2006, 10:18) *

QUOTE(Gabriel @ Sep 15 2006, 07:53) *

isn't it a little tricky to call mdct a filterbank? A filterbank should be doable with a bank of filters, shouldn't it?
QMF are obviously filterbanks, but should we call all time/space->freq transforms like mdct/fft/hct/hadamard filterbanks?

Why not? These are types of critically sampled filterbanks. Take the iMDCT for example: if you set one transform coefficient to 1 and the rest to 0, compute the inverse transform you get the impulse response of the synthesis filter for that specific subband whose sample you set to 1. Assuming equal analysis and synthesis windows which corresponds to an orthogonal filterbank, the analysis filter's impulse response is the time-reversed version of the synthesis impulse response.


I am not questioning the fact that those are critically sampled time to freq. transforms. What I am wondering is about the semantic limitations of the "filterbank" word.
Let's consider the mp2/mp3 pqmf: in this case it's a bank of 32 filters, thus a "filterbank"
But regarding, as an example, an FFT the number of separate filters that would be needed to compute it with a set of separate filters is the same as the number of points of the FFT, with a very very very sharp transition band on each filter, and each subband is just 1 point wide.
I am wondering in this case if it's still reasonable to use the "filterbank" word.
SebastianG
QUOTE(Gabriel @ Sep 15 2006, 12:01) *

Let's consider the mp2/mp3 pqmf: in this case it's a bank of 32 filters, thus a "filterbank"

QUOTE(Gabriel @ Sep 15 2006, 07:53) *

Isn't it a little tricky to call mdct a filterbank?
A filterbank should be doable with a bank of filters, shouldn't it?

Yes. And this applies to the MDCT as well.

QUOTE(Gabriel @ Sep 15 2006, 12:01) *

I am wondering in this case [FFT] if it's still reasonable to use the "filterbank" word.

I agree with you that when you speak of one FFT on a block of samples the word "filterbank" might be confusing. However, if you're going to FFT consecutive blocks of the same size (with whatever window and overlap) this process applies to our definition of a filterbank.
Gabriel
QUOTE(SebastianG @ Sep 15 2006, 12:54) *

I agree with you that when you speak of one FFT on a block of samples the word "filterbank" might be confusing. However, if you're going to FFT consecutive blocks of the same size (with whatever window and overlap) this process applies to our definition of a filterbank.

If I understand you, you are considering a set of successive critically sampled timeToFreq transforms as a filter band, thus our fft or mdct over successive frames is a filterbank over the whole track?

While it's true, I really have the feeling that you are a little twisting things to fit a filterbank into this process...
SebastianG
I don't see why we are arguing about this.
You seem to basically agree with me ("While it's true...") but don't like to use the term filterbank for things like the MDCT. I'm fine with that. I wouldn't call consecutivly applied STFTs to a signal a filterbank either -- even though it does fit our definition of filterbank.
Gabriel
I am not arguing, just trying to understand your point of view.
As filterbanks are not part of my scholar education, I am not totally sure about the definition of it in borderline cases. That is why i am interested in other's knowledgable opinions.

edit: added a missing letter
SebastianG
It's just that ... a bank of filters working on incoming data (in parallel). In case of critically sampled filterbanks the samplingrate of the overall outcome matches the samplingrate of the overall income. So, an critically sampled analysis filterbank includes downsampling at the end and a critically sampled synthesis filterbank includes upsampling at the beginning.

Note: up/down sampling refers to inserting zeros/throwing samples away without filtering. Interpolation/Decimation refers to up/down sampling coupled with anti-alias/imaging filter.

It may not be obvious first but ... doing lapped MDCT like in MP3 and AAC is nothing else like a bunch of filters working in parallel on the incoming data followed by downsampling to make it critically sampled. The windowed cosine waves are the filters' impulse responses.
TaichiOrange
Thanks for your information, although For some opinions I do not understand very well. Maybe I need to refer to textbook.

QUOTE(SebastianG @ Sep 15 2006, 20:30) *

Interpolation/Decimation refers to up/down sampling coupled with anti-alias/imaging filter.


Should it be : Interpolation/Decimation refers to up/down sampling coupled with imaging filter/anti-alias ? As I know, the up sampling would get imaging frequency.

TaichiOrange
QUOTE(SebastianG @ Sep 15 2006, 20:30) *

It's just that ... a bank of filters working on incoming data (in parallel). In case of critically sampled filterbanks the samplingrate of the overall outcome matches the samplingrate of the overall income. So, an critically sampled analysis filterbank includes downsampling at the end and a critically sampled synthesis filterbank includes upsampling at the beginning.

Note: up/down sampling refers to inserting zeros/throwing samples away without filtering. Interpolation/Decimation refers to up/down sampling coupled with anti-alias/imaging filter.



As I understand, we can consider FFT as a critically sampled filterbanks, because every subbank has one frequency line. But it does not need to do downsampling and upsampling? thanks.
edit: I have understand your upsampling and downsampling description. thanks.
SebastianG
QUOTE(TaichiOrange @ Sep 15 2006, 17:06) *

Should it be : Interpolation/Decimation refers to up/down sampling coupled with imaging filter/anti-alias ? As I know, the up sampling would get imaging frequency.

It's "anti" for both.

QUOTE(TaichiOrange @ Sep 15 2006, 17:22) *

As I understand, we can consider FFT as a critically sampled filterbanks, because every subbank has one frequency line. But it does not need to do downsampling and upsampling? thanks.

Sorry, I'm not gonna elaborate on this any longer.
It looks more like a fundamental understanding problem.
You're probably best served with a good textbook on those things.

edit: I just saw your changes. I'm glad something made it through. smile.gif
TaichiOrange
QUOTE(SebastianG @ Sep 16 2006, 00:08) *

QUOTE(TaichiOrange @ Sep 15 2006, 17:06) *

Should it be : Interpolation/Decimation refers to up/down sampling coupled with imaging filter/anti-alias ? As I know, the up sampling would get imaging frequency.

It's "anti" for both.



I mean that : Upsampling only need to do imaging filter, because upsampling only cause imaging frequency, but no alias. Downsampling only need to do anti-alias, because downsampling does not cause imaging frequency.
I am a newer to DSP, thanks for your reply and explanation.


SebastianG
Ohh...!
Now I see what you mean. I didn't pay attention to the order of the words I used. Yes, of course, you're right.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.