Help - Search - Members - Calendar
Full Version: Wavelet Filterbanks
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
pest
I'm experimenting with wavelet filterbanks and audio compression.
Maybe someone can help to enlight me a bit, because i am no genius when it comes
to mathematics.

How long should be the filter-length? i experimented with short integer-wavelets
but got horrible results if i don't normalize the transform in the quantization step.
Also longer-filters seam to have a better time/frequency resolution.

Do i have to overlap the transform to avoid block-boundary-artifacts?
In my current version i only do a symmetric extension if neccessary.

is it possible to direct apply the calculated maskers in the wavelet-domain?
in my experiments that didn't seem to be the case, because i need about 300kbit
to get acceptable quality.

has someone experimented with adaptive filterbanks?

any help would be great

pest
Garf
QUOTE(pest @ Jun 11 2006, 11:43) *
I'm experimenting with wavelet filterbanks and audio compression.


I can tell you the result of your experiments already: it's going to suck. Now, didn't I just save you a lot of time? smile.gif

QUOTE

How long should be the filter-length? i experimented with short integer-wavelets
but got horrible results if i don't normalize the transform in the quantization step.
Also longer-filters seam to have a better time/frequency resolution.


It depends on what kind of filters you are using and what kind of thing you want out of them.

QUOTE

Do i have to overlap the transform to avoid block-boundary-artifacts?
In my current version i only do a symmetric extension if neccessary.


Yes if the filterbank is time-varying, no if it is not.

QUOTE

is it possible to direct apply the calculated maskers in the wavelet-domain?


How are you calculating the maskers? What is the nature of the wavelets?

QUOTE

in my experiments that didn't seem to be the case, because i need about 300kbit
to get acceptable quality.


300kbits what? mono/stereo? What kind of quantization? Using lossless entropy coding?

QUOTE

has someone experimented with adaptive filterbanks?


Loads smile.gif

pest
Thanks for your answers,

QUOTE

Now, didn't I just save you a lot of time? smile.gif


i have enough of it smile.gif

QUOTE

It depends on what kind of filters you are using and what kind of thing you want out of them.


you're right. i just thought there is something like a critical length to get
acceptable results because i read often about lengths around 20. i could
make the length adaptive to match the signal characteristics.

QUOTE

Yes if the filterbank is time-varying, no if it is not.


time-varying means different framesizes?


QUOTE

How are you calculating the maskers? What is the nature of the wavelets?


i'm calculating the maskers with a simple mpeg-1 psychoaccoutic model.
the wavelets are biorthogonal and short (9 taps)


QUOTE

300kbits what? mono/stereo? What kind of quantization? Using lossless entropy coding?


Stereo streams. i am using linear quantization with a deadzone and
a binary bitplane coder which kicks huffmans ass. wink.gif
perhaps i should switch to something nonlinear
because the results are really ok for samples with low applitude.


QUOTE

has someone experimented with adaptive filterbanks?
QUOTE

Loads smile.gif



is there a simple way to determine a good basis based on psychoaccoustics?
i'm using an energy criterion but that seems to favor the wrong bands.

thanks, you're great
Garf
QUOTE(pest @ Jun 11 2006, 12:59) *

QUOTE

It depends on what kind of filters you are using and what kind of thing you want out of them.


you're right. i just thought there is something like a critical length to get
acceptable results because i read often about lengths around 20. i could
make the length adaptive to match the signal characteristics.


You can safely assume that there's no agreed-upon way to apply wavelets to audio coding. I mean, if all results suck, it's rather hard to talk about things as "critical length" and "acceptable results".

QUOTE

QUOTE

Yes if the filterbank is time-varying, no if it is not.


time-varying means different framesizes?


I was thinking about adaptive filterbanks, which seems to be what everybody wants to do with wavelets.

QUOTE

QUOTE

How are you calculating the maskers? What is the nature of the wavelets?


i'm calculating the maskers with a simple mpeg-1 psychoaccoutic model.
the wavelets are biorthogonal and short (9 taps)


Take a look at the frequency responses of the MPEG-1 model's FFT and that of your wavelets.

QUOTE

QUOTE

300kbits what? mono/stereo? What kind of quantization? Using lossless entropy coding?


Stereo streams. i am using linear quantization with a deadzone and
a binary bitplane coder which kicks huffmans ass. wink.gif


Binary bitplane coder? Kick Huffman's ass? You are probably seriously misunderstanding something in that area, but without any details it's impossible to see what.

QUOTE

perhaps i should switch to something nonlinear
because the results are really ok for samples with low applitude.


Maybe. Maybe you're missing some kind of scalefactor or other normalization structure?

QUOTE

is there a simple way to determine a good basis based on psychoaccoustics?
i'm using an energy criterion but that seems to favor the wrong bands.

thanks, you're great


In my opinion this is the optimal one: you always pick a DCT basis, never the wavelet one smile.gif

If you want more comments it's probably a good idea to give a complete overview of what you're doing. Otherwhise any reponse is just shooting in the dark, really.
pest
QUOTE

I mean, if all results suck, it's rather hard to talk about things as "critical length" and "acceptable results".


really? do you know why there's so much attention for wavelets in audio coding?

QUOTE

Take a look at the frequency responses of the MPEG-1 model's FFT and that of your wavelets.


it's different tongue.gif - back to school

QUOTE

Binary bitplane coder? Kick Huffman's ass? You are probably seriously misunderstanding something in that area, but without any details it's impossible to see what.


the entropy-coder scans every bitplane. the probability of the current bit being '1' is measured
with an adaptive context-model based on the neighbourhood and
encoded with a range-coder. it's slow, but it works really good.

the coder is really simple and i don't think that i've completly missed something.
short description:
the input is splitted in frames of 2048 samples.
the psychoaccoustic model calculates the maskers.
i do the wavelet-transform with a basis (17 subbands) that resembles some of the critical bands.
then i quantizise the bands based on the maskers and encode the bands with the bitplane-coder.

Great that you've helped me. i'm looking deeper into the current lack of correct scaling.

[edit]
number of subbands added
Garf
QUOTE(pest @ Jun 11 2006, 14:18) *
QUOTE

I mean, if all results suck, it's rather hard to talk about things as "critical length" and "acceptable results".


really? do you know why there's so much attention for wavelets in audio coding?


People like buzzwords? Ignorance? No baseline to compare against so you can get away with publishing basically anything?

I don't know of any single good reason, though.

QUOTE

QUOTE

Take a look at the frequency responses of the MPEG-1 model's FFT and that of your wavelets.


it's different tongue.gif - back to school


It doesn't have to be exactly the same, but you probably saw the problem already?

QUOTE

the entropy-coder scans every bitplane. the probability of the current bit being '1' is measured
with an adaptive context-model based on the neighbourhood and
encoded with a range-coder. it's slow, but it works really good.


Ah, so it's really more like arithmetic coding/CABAC smile.gif

QUOTE

the coder is really simple and i don't think that i've completly missed something.
short description:
the input is splitted in frames of 2048 samples.
the psychoaccoustic model calculates the maskers.
i do the wavelet-transform with a basis (17 subbands) that resembles some of the critical bands.
then i quantizise the bands based on the maskers and encode the bands with the bitplane-coder.

Great that you've helped me. i'm looking deeper into the current lack of correct scaling.


Why would this ever perform better or even comparable to a DCT based coder?
SebastianG
QUOTE(Garf @ Jun 11 2006, 20:43) *

Why would this ever perform better or even comparable to a DCT based coder?


That's an important point.

The thing is -- like JJ mentioned in one of his presentations (I lost the link, but Garf may have it somewhere) -- a transform should do 2 things:
- allow accurate quantization noise distribution according to psychoacoustics
- give an energy compact representation of the audio signal (decorrelation)

The bad news: These can't usually be met simultaneously. Example: Sinusoids are compactly represented via MDCT but the MDCT gives a pretty low temporal resolution (too low for upper frequencies) which might result in pre-/post echo.

While the wavelet approach may result in a filterbank which kind of resembles critical bands (this would help distribute quantization noise the way the psychoacoustic model suggests) it really sucks in decorrelating sinusodial signal parts and thus requires a high bandwidth in those situations.

Switching between MDCT and and Wavelet packet transform is possible but IMHO really not worth the effort compared to blocksize-switched MDCT+TNS.


Sebastian
pest
QUOTE

Why would this ever perform better or even comparable to a DCT based coder?


I try to compensate the lack on sinusoids with a more sophisticated entropy-coding
and i like the possibilities you can do with wavelets. there's no final goal, but after
implementing a simple wavelet-video-codec i'm trying this one. you can always learn something smile.gif


QUOTE

While the wavelet approach may result in a filterbank which kind of resembles critical bands (this would help distribute quantization noise the way the psychoacoustic model suggests) it really sucks in decorrelating sinusodial signal parts and thus requires a high bandwidth in those situations.


i'm currently experimenting with linear-predictive coding on the subband data. one thing which
bothers me, is that in most papers they suggest using subband-prediction prior quantization
but i fear my lms-filter could get unstable if the quantization-noise is too high. is there any
disadvantage in using it after quantization?
Woodinville
QUOTE(SebastianG @ Jun 12 2006, 00:25) *
The thing is -- like JJ mentioned in one of his presentations (I lost the link, but Garf may have it somewhere) -- a transform should do 2 things:
- allow accurate quantization noise distribution according to psychoacoustics
- give an energy compact representation of the audio signal (decorrelation)...


http://mue.music.miami.edu/AES/PerceptualCoding.ppt Might be what you were looking for. If it's not, you could check http://mue.music.miami.edu/AES/ where there is a bunch of stuff jj did at Miami in two days running.
HotshotGG
QUOTE
While the wavelet approach may result in a filterbank which kind of resembles critical bands (this would help distribute quantization noise the way the psychoacoustic model suggests) it really sucks in decorrelating sinusodial signal parts and thus requires a high bandwidth in those situations.


I don't know to much about multi-rate signal processing, but are there any wavelets that have good decorrelation properties? Also, does the KLT have good decorrelation properties? without going off topic just out of curiousity I have seen it used elsewhere and I am curious about that?
Garf
QUOTE(pest @ Jun 12 2006, 13:40) *
QUOTE

Why would this ever perform better or even comparable to a DCT based coder?


I try to compensate the lack on sinusoids with a more sophisticated entropy-coding


Sure, but that doesn't have anything to do with the transform itself...


QUOTE(HotshotGG @ Jun 12 2006, 21:32) *

Also, does the KLT have good decorrelation properties? =


It's optimal by definition.
Woodinville
QUOTE(HotshotGG @ Jun 12 2006, 12:32) *

QUOTE
While the wavelet approach may result in a filterbank which kind of resembles critical bands (this would help distribute quantization noise the way the psychoacoustic model suggests) it really sucks in decorrelating sinusodial signal parts and thus requires a high bandwidth in those situations.


I don't know to much about multi-rate signal processing, but are there any wavelets that have good decorrelation properties? Also, does the KLT have good decorrelation properties? without going off topic just out of curiousity I have seen it used elsewhere and I am curious about that?


Well, you have to realize that good decorrelation properties are, in general, analogous to having sharp, narrow frequency response.

The KLT is defined as optimum for the distribution it was designed for, of course.
HotshotGG
QUOTE
Well, you have to realize that good decorrelation properties are, in general, analogous to having sharp, narrow frequency response.

The KLT is defined as optimum for the distribution it was designed for, of course.


I see interesting. Thank you for clarifying that. I had seen the transform used in a research implementation of an MPEG-2 AAC encoder and it's the only other transform, I have ever seen used aside from DCT-IV for audio coding at least. biggrin.gif
Woodinville
QUOTE(HotshotGG @ Jun 13 2006, 16:25) *

QUOTE
Well, you have to realize that good decorrelation properties are, in general, analogous to having sharp, narrow frequency response.

The KLT is defined as optimum for the distribution it was designed for, of course.


I see interesting. Thank you for clarifying that. I had seen the transform used in a research implementation of an MPEG-2 AAC encoder and it's the only other transform, I have ever seen used aside from DCT-IV for audio coding at least. biggrin.gif


Really? A KLT was used where in the MPEG_2 AAC research? I am very curious, if you could expound a bit more I'd be gratefl.
HotshotGG
QUOTE
Really? A KLT was used where in the MPEG_2 AAC research? I am very curious, if you could expound a bit more I'd be gratefl.


I forget the authors name the paper comes from UCLA Electrical Engineering department. I think the authors first name is Chris Kayakis. I have it looming around my HD somewhere. Let me take a look and if I find it I will upload it for you. The authors were using KLT transform and then comparing the difference between using scalar quantization entropy scheme to VQ one as backend. This was a multichannel coder I should add. wink.gif
Woodinville
QUOTE(HotshotGG @ Jun 13 2006, 21:27) *

QUOTE
Really? A KLT was used where in the MPEG_2 AAC research? I am very curious, if you could expound a bit more I'd be gratefl.


I forget the authors name the paper comes from UCLA Electrical Engineering department. I think the authors first name is Chris Kayakis. I have it looming around my HD somewhere. Let me take a look and if I find it I will upload it for you. The authors were using KLT transform and then comparing the difference between using scalar quantization entropy scheme to VQ one as backend. This was a multichannel coder I should add. wink.gif



Ahhh, you're talking about Chris Kyriakis (sp?), then, was it perhaps USC, and that the KLT was used to attempt to diagonalize the multichannel signal??

Perchance, could you tell me how the reflected the perceptual thresholds back to the original channel signals?

You had me there for a minute, I rather thought you meant using a KLT as a transform filterbank. Under such conditions, one wonders how one would transmit the basis vectors, eh?
HotshotGG
QUOTE
Ahhh, you're talking about Chris Kyriakis (sp?), then, was it perhaps USC, and that the KLT was used to attempt to diagonalize the multichannel signal??

Perchance, could you tell me how the reflected the perceptual thresholds back to the original channel signals?

You had me there for a minute, I rather thought you meant using a KLT as a transform filterbank. Under such conditions, one wonders how one would transmit the basis vectors, eh?


Yes, it was the KLT and it was a diagonal matrix. I was confused I haven't read it in a long time and I thought they were using the KLT as a transform filterbank. I think that's were the confusion set in. Again, I don't understand multi-rate signal processing that well just the regular stuff. I was just questioning it's decorrelation properties, which according to this algorithmic implementation appear to be quite good and provide better results. A SMR increase of 2.2 dB is convincing, even though it's a rather small improvement for individual scale factor bands. It was a rather clever idea. Scrap the VQ thing that was a different paper. wink.gif

http://viola.usc.edu/newextra/Publication/...E-TSAP_Yang.pdf

ahah! here it is. I am interested in seeing the results from this adaptive filterbank the author is experimenting with.
foxyshadis
Has there been research into the audio modeling properties of curvelets, bandlets, ridgelets etc? Other than terrifically slow, which I've found quite enough in image coding. tongue.gif
HotshotGG
QUOTE
Has there been research into the audio modeling properties of curvelets, bandlets, ridgelets etc? Other than terrifically slow, which I've found quite enough in image coding.


The problem here in lies that multi-resolution anaylsis to an extent is really not suited for stationary signals. It can give quite remarkable results in terms of image coding. As was explained in this thread, there is on going research though in using wavelets in adaptive filterbanks for audio compression and there have been an number Research papers written based upon it. The results are half and half. wink.gif
Woodinville
QUOTE(HotshotGG @ Jun 14 2006, 15:05) *

QUOTE
Ahhh, you're talking about Chris Kyriakis (sp?), then, was it perhaps USC, and that the KLT was used to attempt to diagonalize the multichannel signal??

Perchance, could you tell me how the reflected the perceptual thresholds back to the original channel signals?

You had me there for a minute, I rather thought you meant using a KLT as a transform filterbank. Under such conditions, one wonders how one would transmit the basis vectors, eh?


Yes, it was the KLT and it was a diagonal matrix. I was confused I haven't read it in a long time and I thought they were using the KLT as a transform filterbank. I think that's were the confusion set in. Again, I don't understand multi-rate signal processing that well just the regular stuff. I was just questioning it's decorrelation properties, which according to this algorithmic implementation appear to be quite good and provide better results. A SMR increase of 2.2 dB is convincing, even though it's a rather small improvement for individual scale factor bands. It was a rather clever idea. Scrap the VQ thing that was a different paper. wink.gif

http://viola.usc.edu/newextra/Publication/...E-TSAP_Yang.pdf

ahah! here it is. I am interested in seeing the results from this adaptive filterbank the author is experimenting with.


Yep, that was for doing interchannel diagonalization, not for the main MDCT filterbank. It is possible to design a KLT like thing for the MDCT. You do a great lot of work, wind up with an n^2 instead of n log n complexity, and get just about zip, at the cost of a great headache and the need to send the basis vectors every once in a while to rather a lot of resolution.

QUOTE(foxyshadis @ Jun 14 2006, 17:21) *

Has there been research into the audio modeling properties of curvelets, bandlets, ridgelets etc? Other than terrifically slow, which I've found quite enough in image coding. tongue.gif


Well, if you were to go to Wm Yost's book on the phsysiology of the ear, Brian Moore's book on the psychology of hearing, and such, you could derive answers from the measurements into the actual filterbank structure of the ear, I'll bet.

QUOTE(HotshotGG @ Jun 14 2006, 18:26) *

QUOTE
Has there been research into the audio modeling properties of curvelets, bandlets, ridgelets etc? Other than terrifically slow, which I've found quite enough in image coding.


The problem here in lies that multi-resolution anaylsis to an extent is really not suited for stationary signals. It can give quite remarkable results in terms of image coding. As was explained in this thread, there is on going research though in using wavelets in adaptive filterbanks for audio compression and there have been an number Research papers written based upon it. The results are half and half. wink.gif


I'd say zero for 'n' myself. Deepen Sinha wrote the first fairly complete one I've heard of at the U of Minnesota, working for Ahmed Tewfik.

It was great on castinettes. It was horrible on stationary signals.

Johnston and Brandenburg started in that direction, did this thing called something like the "hybrid coder", took a fast look at the results, and ran like the wind.
HotshotGG
QUOTE
You do a great lot of work, wind up with an n^2 instead of n log n complexity, and get just about zip, at the cost of a great headache and the need to send the basis vectors every once in a while to rather a lot of resolution.


I know how important complexity is, so I can sympathize with you their.


QUOTE
Johnston and Brandenburg started in that direction, did this thing called something like the "hybrid coder", took a fast look at the results, and ran like the wind.


I am not an Electrical Engineer, but I still think it would be possible to design a filterbank that would give optimal results. I could be wrong about this, but I mean is that not the whole point of doing research and experimenting with different strategies? that's much two cents. wink.gif
Woodinville
QUOTE(HotshotGG @ Jun 16 2006, 11:22) *


I am not an Electrical Engineer, but I still think it would be possible to design a filterbank that would give optimal results. I could be wrong about this, but I mean is that not the whole point of doing research and experimenting with different strategies? that's much two cents. wink.gif


Well, consider. In order to have the requisite frequency resolution to have good gain, you must have frequency response similar to each bin of an MDCT.

That means that your first filter in your filter tree has to be the same length as the whole MDCT.
The second one has to be half the length, at half the sample rate.
The third one has to be 1/4 that length, at 1/4 the sample rate.

And so on.

Now, what kind of total impulse response length are you going to wind up with? What sort of pre-echo problems?
Garf
QUOTE

QUOTE
Johnston and Brandenburg started in that direction, did this thing called something like the "hybrid coder", took a fast look at the results, and ran like the wind.


I am not an Electrical Engineer, but I still think it would be possible to design a filterbank that would give optimal results. I could be wrong about this, but I mean is that not the whole point of doing research and experimenting with different strategies? that's much two cents. wink.gif


But what are "optimal results?"

Garf
QUOTE(Woodinville @ Jun 16 2006, 01:46) *

I'd say zero for 'n' myself. Deepen Sinha wrote the first fairly complete one I've heard of at the U of Minnesota, working for Ahmed Tewfik.

It was great on castinettes. It was horrible on stationary signals.


<tounge in cheek>
I believe you are mistaken. The paper clearly reports "near transparent" coding at 55-63kbps mono (or 110-128kbps stereo without any channel coupling). Clearly, that's an excellent result.
</tongue in cheeck>

I have some other paper from Deepen Shinha, which was written together with J. Johnston, also about a hybrid scheme. The idea looks plausible, and they reported improvement over PAC, which was not bad in itself. I wonder what became of that. If I'd have to guess, they discovered that the impulselike signals were almost never quite so isolated, and their wavelet filterbank was losing out for most mixed cases exactly because the energy compaction properties of the wavelet filterbank was worse than that of the DCT. Or maybe it was just too slow. I'm just wagering a wild guess here, anyway.

Woodinville
QUOTE(Garf @ Jun 16 2006, 12:15) *

QUOTE(Woodinville @ Jun 16 2006, 01:46) *

I'd say zero for 'n' myself. Deepen Sinha wrote the first fairly complete one I've heard of at the U of Minnesota, working for Ahmed Tewfik.

It was great on castinettes. It was horrible on stationary signals.


<tounge in cheek>
I believe you are mistaken. The paper clearly reports "near transparent" coding at 55-63kbps mono (or 110-128kbps stereo without any channel coupling). Clearly, that's an excellent result.
</tongue in cheeck>

I have some other paper from Deepen Shinha, which was written together with J. Johnston, also about a hybrid scheme. The idea looks plausible, and they reported improvement over PAC, which was not bad in itself. I wonder what became of that. If I'd have to guess, they discovered that the impulselike signals were almost never quite so isolated, and their wavelet filterbank was losing out for most mixed cases exactly because the energy compaction properties of the wavelet filterbank was worse than that of the DCT. Or maybe it was just too slow. I'm just wagering a wild guess here, anyway.


Well, perhaps it could be that while the wavelet decomposition is very good for transients, and the MDCT very good for stationary signals, that the switching between the two is hideously inefficient in terms of both perception and rate gain, and makes the switching so painful as to be nearly useless.

I believe John Princen published a paper on how to deal with mixed cases, but it suffered the same kind of problems in switching, and proved not to reduce the rate much if implimented in n^2 complexity anyhow.
foxyshadis
Hmm, interesting stuff. Has there been research into finding and subtracting transients out of the waveform and handing them to a wavelet sidechannel to compress? Or just piping the residue (however that's convenient to define) of an mdct coding into a wavelet engine? Rather than trying to switch off between the two for each window. You'd have to dampen it so that only areas that had high enough residual energy were passed on, so that the coding would be more efficient and as importantly, encoding and decoding much more performant in unaffected windows.

The braindead way would just be a to code an ogg q0 or equiv aac and pass the raw residue passed through a kind of noise gate, but that still wastes some bits on the ogg side on what would be better modeled by the wavelet.

Maybe that just exposes my noobishness around audio coding.
Garf
QUOTE(foxyshadis @ Jun 17 2006, 04:49) *
Or just piping the residue (however that's convenient to define) of an mdct coding into a wavelet engine?


The MDCT is a transform. So I really wonder what kind of "residue" definition one could give.


QUOTE(foxyshadis @ Jun 17 2006, 04:49) *

The braindead way would just be a to code an ogg q0 or equiv aac and pass the raw residue passed through a kind of noise gate, but that still wastes some bits on the ogg side on what would be better modeled by the wavelet.


Raw residue? Noise gate?

Non comprende.
HotshotGG
QUOTE
Or just piping the residue (however that's convenient to define) of an mdct coding into a wavelet engine?


It's not as simple as it looks. THE MDCT is a mathmatical transform. It takes a series of coeffcients or samples, in the time domain and transform them into the frequency doman. The DCT-IV is mostly always used in audio coding, because of block boundary problems that would concur and the way the data is recontructed via a synthesis filterbank (decode process). It satifies certain mathmatical condition, which make it optimal. What you are referring to is the entropy coding scheme here, the backend stuff. The only reason it looks promising is, because you would never have to worry about any sort of overshoots via Gibbs Phenomenon. That's were pre-echo problems tend to arise, but there are other ways of dealing with it that are considered optimal.



QUOTE
but that still wastes some bits on the ogg side on what would be better modeled by the wavelet.


The problem here is that, Vorbis is designed so that it can support a hybrid filterbank, but as was discussed in this thread. There have been several research papers written, which attempt to implement an adaptive filterbank, which is similiar to a hybrid filterbank that can model different parts of a signal more effciently. Anyway to cut to the chase... the results of using adaptive filterbanks with wavelets are half and half. SebastianG at best claims that it can be done, but it's not worth the headache. It would be very similiar to MDCT + TNS (AAC). The only difference would be that it would have to be a wavelet packet transform of some sort. That's similiar to what the original author of this thread is trying to do.


QUOTE
Non comprende.


I think what he is trying to imply is deploying a entropy coding scheme to transmit the wavelet coeffcients, in much the same way Vorbis handles residue data via a VQ book. In essence having some sort of synthesis filterbank during decoding step, which is able to reconstruct the data. wink.gif
foxyshadis
Ah, I didn't realize that the adaptive and hybrid were so similar, thanks.

Garf, I meant the residue after the quantization step, somehow run through IDCT and piped into a wavelet codec. But I guess techniques of this sort have already been largely discarded for more efficient and performant techniques. TNS and other advances really seem to make a big difference for AAC.
HotshotGG
QUOTE
Ah, I didn't realize that the adaptive and hybrid were so similar, thanks.


The terminology is annoying and confusing, but basically an adaptive filterbank is a hybrid filterbank or quite similiar to one. That's just a technical way of saying it wink.gif

QUOTE
TNS and other advances really seem to make a big difference for AAC.


Right. In this case there is no sense in going to the extra trouble. It's easier to use a tool like TNS, which is optimal and can give considerable performance gains. That's what the argument here is and some folks disagree.
Garf
QUOTE(HotshotGG @ Jun 17 2006, 21:17) *
The only reason it looks promising is, because you would never have to worry about any sort of overshoots via Gibbs Phenomenon. That's were pre-echo problems tend to arise, but there are other ways of dealing with it that are considered optimal.


"Huh?" I really don't understand a word of what you're trying to say.

QUOTE

It would be very similiar to MDCT + TNS (AAC). The only difference would be that it would have to be a wavelet packet transform of some sort. That's similiar to what the original author of this thread is trying to do.


I don't believe this at all. I don't think you'd be able to get a compaction as good as the DCT in the steady signal frequencies, and even getting close would require a massive performance penalty over the plain DCT.

It should be worse, not similar. That's why I keep saying the results of the wavelets are going to suck.

This is if we're talking about the transform part. As for operations on the DCT coeffcients themselves, I don't have such a strong opinion; I haven't read much work on that nor particularly done research in that area. So any references would be interesting.

QUOTE(HotshotGG @ Jun 18 2006, 08:42) *
It's easier to use a tool like TNS, which is optimal


TNS is optimal? Any proof/paper/reference for such a very bold claim?

I'm not disputing that it's very good, but optimal, boy, that is something different entirely.
HotshotGG
QUOTE
TNS is optimal? Any proof/paper/reference for such a very bold claim?

I'm not disputing that it's very good, but optimal, boy, that is something different entirely.



ok... so maybe I don't understand what the word "optimal" means laugh.gif . Optimal as in the filterbank has good performance or optimal as in it satistfies certain mathmatical conditions? and to think I was considering Technical Writing as career. tongue.gif


QUOTE
It should be worse, not similar. That's why I keep saying the results of the wavelets are going to suck.


I am sure that's the case, but what the hell I mean it's an experiment right?. I meant similiaries between what Sebastian stated and what the author of the thread is trying to do, something along those lines. If they are similar at all wink.gif


QUOTE
This is if we're talking about the transform part. As for operations on the DCT coeffcients themselves, I don't have such a strong opinion; I haven't read much work on that nor particularly done research in that area. So any references would be interesting.


I will have to look into that. wink.gif


Garf
QUOTE(HotshotGG @ Jun 18 2006, 10:15) *
QUOTE
TNS is optimal? Any proof/paper/reference for such a very bold claim?

I'm not disputing that it's very good, but optimal, boy, that is something different entirely.



ok... so maybe I don't understand what the word "optimal" means laugh.gif . Optimal as in the filterbank has good performance or optimal as in it satistfies certain mathmatical conditions? and to think I was considering Technical Writing as career. tongue.gif


Something which is optimal cannot be improved.

The KLT is optimal in the sense stated earlier.

I do not see any reason to believe TNS is optimal.
SebastianG
QUOTE(pest @ Jun 12 2006, 13:40) *

I try to compensate the lack on sinusoids with a more sophisticated entropy-coding
and i like the possibilities you can do with wavelets. there's no final goal, but after
implementing a simple wavelet-video-codec i'm trying this one. you can always learn something smile.gif

Have you checked the amplitude responses of your subbands' synthesis filters? It might be interesting to see how quantization noise gets spread... For example: If you recursivly split the (lower) band via the CDF9/7 filters. One synthesis filter (for the band "689-1378 Hz" at fs=44.1kHz) has the following response:
IPB Image
So, if you add some quantization noise in that band you'll also have the noise's alias across the spectrum and around 4000 Hz a rejection of only 27 dB for example. Admittedly, the performance looks better than I expected. However, I think that the analysis filters' responses look worse (not the same 'cause CFD9/7 aren't orthogonal) which means you'll end up with a lot of aliasing in the subbands. (Yes, it get's cancelled in the synthesis stage but in terms of energy compaction the performance might really suck 'cause one sinusoid not only ends up in one subband but in many (This is always the case but you typically want to reduce that via analysis/synthesis filters with a higher rejection.))

edit: for the sake of completeness here's the amplitude response of the analysis filter for that band:
IPB Image
Obviously the rejection is poorer compared to the synthesis filter.

QUOTE(pest @ Jun 12 2006, 13:40) *

i'm currently experimenting with linear-predictive coding on the subband data. one thing which
bothers me, is that in most papers they suggest using subband-prediction prior quantization
but i fear my lms-filter could get unstable if the quantization-noise is too high. is there any
disadvantage in using it after quantization?

Yeah ... If you do it prior quantization you don't need the exact same implementation of the filter on the decoder side (in terms of accuracy/precision, rounding errors) if you keep the filters "stable enough" (which is not that hard -- you should use the quantized samples to "train" the filter and add some damping here and there). OTOH, if you do the prediction on the quantized samples as a noiseless encoding step you need to quantize the prediction as well and code the difference to the already quantized sample. Accuracy/Precision is important here. You need to do the exact same thing (preferably only integer arithmetic) in the decoder as you did in the encoder since a very small change of the prediction can cause the quantizer to round to a different integer sample near thresholds. In such a case you're f****d.

The AAC specification contains something similar to improve the coding performance for sinusoids even on the already very narrow MDCT-subbands (backward-adaptive 2nd order prediction) at the cost of higher complexity, of course. You might want to grab the spec to see how it is done.

Sebastian
pest
QUOTE

Have you checked the impulse responses of your subbands' synthesis filters? It might be interesting to see how quantization noise gets spread...


i already noticed that, and thanks for your detailed explanation. i'm looking deeper into it,
it's just too warm here wink.gif
i normalize the transform across the bands so that noise in deeper subbands
has the same impact on the whole transform.
do you think the cdf97 filter is suitable for audio?

QUOTE

OTOH, if you do the prediction on the quantized samples as a noiseless encoding step you need to quantize the prediction as well and code the difference to the already quantized sample. Accuracy/Precision is important here. You need to do the exact same thing (preferably only integer arithmetic) in the decoder as you did in the encoder since a very small change of the prediction can cause the quantizer to round to a different integer sample near thresholds. In such a case you're f****d.


the adaptive lms filter is lossless (yet it uses floating point) , but why do i have to quantize the
prediction? the filter predicts the quantized subband-data. it's all working as expected.

QUOTE

The AAC specification contains something similar to improve the coding performance for sinusoids even on the already very narrow MDCT-subbands (backward-adaptive 2nd order prediction) at the cost of higher complexity, of course. You might want to grab the spec to see how it is done.


the lms-filter uses up to 32nd order and it's slow as hell, but the improvement with 2nd-order
was too low, because the entropy-coder catches most of the linearity already

back to the beach cool.gif
have a nice day
SebastianG
QUOTE(pest @ Jun 20 2006, 15:09) *

do you think the cdf97 filter is suitable for audio?

No.

QUOTE(pest @ Jun 20 2006, 15:09) *

the adaptive lms filter is lossless (yet it uses floating point) , but why do i have to quantize the
prediction? the filter predicts the quantized subband-data. it's all working as expected.

Feeding the adaptive filter with quantized samples is fine. I had these two possibilities in mind though:
CODE

method1:
  encoder-loop over n:
    quantized_n = quantize(signal_n)
    qpredicted_n = quantize(predict())
    encode(quantized_n-qpredicted_n)
    updatefilter(quantized_n)
  decoder-loop over n:
    qpredicted_n = quantize(predict())
    quantized_n = qpredicted_n + decode()
    out(quantized_n)
    updatefilter(quantized_n)

method2:
  encoder-loop over n:
    predicted_n = predict()
    qdiff_n = quantize(signal_n - predicted_n)
    quantized_n = predicted_n + qdiff_n
    encode(qdiff_n)
    updatefilter(quantized_n)
  decoder-loop over n:
    predicted_n = predict()
    qdiff_n = decode()
    quantized_n = predicted_n + qdiff_n
    out(quantized_n)
    updatefilter(quantized_n)

in both cases "updatefilter" is called with the distorted (by quantization) samples and "predict" should only work on "quantized_{n-1}, quantized_{n-2}, ...". The difference is: method 1 quantizes before subtracting the prediction and method 2 quantizes after subtraction of the prediction. method 1 requires you to implement the exact same predictor (encoder & decoder) while in method2 small differences in rounding errors are fine.

QUOTE(pest @ Jun 20 2006, 15:09) *

the lms-filter uses up to 32nd order and it's slow as hell, but the improvement with 2nd-order
was too low, because the entropy-coder catches most of the linearity already

Oh wow! 32nd order backward adaptive predictor?!
So, you try to patch the bad properties of your filterbank using very CPU intense prediction tools.
Hmmm... wink.gif

Sebastian
SebastianG
I came across this thesis today.
("Wavelet Filterbanks in Perceptual Audio Coding")

I just skimmed through it and it looks interesting. It covers many types of subband transforms (QMF, generalized QMF aka biorthogonal wavelets, MDCT, ....) as well as psychoacoustic principles (20 pages).
The filterbank section is really huge ... see page 85 for example. it nicely illustrates the f***ed magnitude responses of a full DPWT (iterated DWT on all bands) for a IMHO-not-that-smart choice of filters...

have fun,
Sebastian
HotshotGG
Thanks. I am reading through it right now. I just read through the overview on Perceptual Audio Coding. biggrin.gif
Woodinville
QUOTE(SebastianG @ Jun 29 2006, 07:48) *

I came across this thesis today.
("Wavelet Filterbanks in Perceptual Audio Coding")

I just skimmed through it and it looks interesting. It covers many types of subband transforms (QMF, generalized QMF aka biorthogonal wavelets, MDCT, ....) as well as psychoacoustic principles (20 pages).
The filterbank section is really huge ... see page 85 for example. it nicely illustrates the f***ed magnitude responses of a full DPWT (iterated DWT on all bands) for a IMHO-not-that-smart choice of filters...

have fun,
Sebastian


That's interesting, that author thinks Zelinski and Noll was a perceptual coder. Ditto Crochiere's sub-band coders. Be careful, his history is just wrong there. Hall, Atal and Schroder was the first perceptual attempt, really, and that for speech only. Oddly, as well, there is no mention of AAC.
SebastianG
QUOTE(Woodinville @ Jun 30 2006, 23:09) *

That's interesting, that author thinks Zelinski and Noll was a perceptual coder. Ditto Crochiere's sub-band coders. Be careful, his history is just wrong there. Hall, Atal and Schroder was the first perceptual attempt, really, and that for speech only. Oddly, as well, there is no mention of AAC.

Thank you for clarifying. (So, I'm assuming Zelinsky, Noll, Crochiere all employed (subband) transform methods to solely exploit redundancies in contrast to actually shape noise spectrally).

There may be no mention of AAC in the "historic overview" part but AAC references are all over the place (for example page 1 as "state-of-the-art perceptual coding", as a scheme employing the MDCT in the section describing filterbank requirements, as a scheme that employs a scalar quantizer in the quantization chapter, as a scheme that employs coefficient grouping and huffman coding for noiseless coding, ...)

I still havn't read it completely but it doesn't look too bad and he came to conclusions that are "compatible" to mine -- and yours I think smile.gif. I do miss informations, though: What coder did he actually use (w.r.t. side info coding, lossless coding) to "informally" test the four kinds of filters (including his remez-derived ones)? (not that I care much, but it seems to be missing)

Sebastian
Woodinville
QUOTE(SebastianG @ Jul 3 2006, 01:56) *

QUOTE(Woodinville @ Jun 30 2006, 23:09) *

That's interesting, that author thinks Zelinski and Noll was a perceptual coder. Ditto Crochiere's sub-band coders. Be careful, his history is just wrong there. Hall, Atal and Schroder was the first perceptual attempt, really, and that for speech only. Oddly, as well, there is no mention of AAC.

Thank you for clarifying. (So, I'm assuming Zelinsky, Noll, Crochiere all employed (subband) transform methods to solely exploit redundancies in contrast to actually shape noise spectrally).

There may be no mention of AAC in the "historic overview" part but AAC references are all over the place (for example page 1 as "state-of-the-art perceptual coding", as a scheme employing the MDCT in the section describing filterbank requirements, as a scheme that employs a scalar quantizer in the quantization chapter, as a scheme that employs coefficient grouping and huffman coding for noiseless coding, ...)

I still havn't read it completely but it doesn't look too bad and he came to conclusions that are "compatible" to mine -- and yours I think smile.gif. I do miss informations, though: What coder did he actually use (w.r.t. side info coding, lossless coding) to "informally" test the four kinds of filters (including his remez-derived ones)? (not that I care much, but it seems to be missing)

Sebastian


It's a little harder than I'd like to reply substantively here.

In any case, Zelinsky and Noll was a rate-distortion codec, not a perceptual codec. They had a hint in that a full rate-distortion floor didn't work as well as they expected, but they didn't take it at the time. Crochiere (Subband coding) was also approaching it from a rate distortion viewpoint, although the 16kb/s codec that got used for the older voice messaging did in fact have a frequency partitioning that wasn't too far off of a start at a perceptual codec. The partitioning for the 9.6kb speech SBC using analog filters and integer band sampling undoubtedly benefited from perceptual effects, but I can confidently say that that wasn't the first intention of the particular partition. Quite some codec that, you've never seen that many second-order analog sections in one place before, I think, and you may wish to never see such again.

The whole problem with wavelets and with tree filterbanks in general is that they must have a longer impulse response for the same frequency resolution than the same multiband SBC (say an MDCT, which is of course a multiband SBC with exact reconstruction), assuming that the tree goes beyond one stage, of course. There is really no way out of that conundrum.

Also, if one wishes to switch between the two, the "conversion" filters are the worst of both worlds. The rate hit from that kills the ability to switch pretty much dead in my experience. You can think what you like of my unspecified experience, just go try it yourself. tongue.gif
SebastianG
QUOTE(Woodinville @ Jul 10 2006, 20:15) *

You can think what you like of my unspecified experience, just go try it yourself. tongue.gif

I already did and came to the same conclusions wink.gif
However, I think that the "conversion filters" used in EPAC (you are referring to?) are not the end of the story. Other designs are possible that may have better "edge" properties (concerning time/frequency localization). (yup, I've 3 designs up my sleeve but I'm not sure whether it's worth publishing since the performance compared to AAC-like structures will still be inferior I suppose)

Sebastian
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.