QUOTE(petracci @ Apr 2 2004, 12:05 AM)
This is similar to the method of Purat and Noll, in "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", ICASSP '96.
Hmm... is this paper publicy available ?
I tried citeseer but I was not able to download it.
QUOTE
QUOTE
To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.
Since at both stages you use an orthonormal transform, I do not see why these antialiasing butterflies are necessary?
They are necessary because each of the n MDCT coeffs affects up to 2*n time samples. Consider MDC-Transforming a unit impulse. This pulse is covered by 2 windows and therefore affects 2 MDCT spectra. The butterflies can be used to reduce this alias effect after the 2nd stage so that there'll only appear one pulse in the transformed version.
QUOTE
I understand that (1) and (2) are the properties of the resulting transform, but I was interested in the criteria that you use to achieve high spectral resolution for stationary/tonal parts and high temporal resolution for non-stationary/transient parts. Are you using perceptual entropy, transient detection, analysis-by-synthesis methods.
I'm not doing anything right now. I just tinkered a bit around this filterbank idea. Checking the impulse responses of the inverse filterbank for different pulses in the transform domain. I posted it here to discuss this approach. Well, I did not give much details in the first place. But it's hard to explain.

The main difference compared to other hybrid approaches is, the first stage decomposes the signal with a very high spectral resolution and kind of reverts it for some bands whereas common hybrid filterbanks decompose the signal into broader subbands in the first stage and do further band-splitting in the 2nd stage.
(But we can always reduce the alias effect of the first stage after the 2nd stage by applying alias-reduction butterflies)
QUOTE
However, there is one definite "con" that you did not mention: side information. You have to tell the decoder what transform structure you used. For adaptive framing, this side information is neglegible, but for adaptive frequency decompositions, this is not the case, even when you use entropy coding of the side info.
Yes, therefore I don't think it makes much sense to allow all transform variants. Just a few one that prove to be a good choice in most situations. The sideinformation will be neglegible for just 8 transform variants for example.
Maybe you want to check my view of the MDCT, which explains to some extent why the butterflies after the 2nd stage can be used to cancel the first set of butterflies.
see thread herebye,
Sebastian
edit: fixed some typos (probably not all)