Help - Search - Members - Calendar
Full Version: AES paper by Valin, J.-M., Montgomery, C.
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
SebastianG
... Vorbis Psychoacoustic Model to Speex"

I just wanted to point out this publicaction. It might be of interest for some of you. Maybe it can be discussed here.

For starters I was quite surprised about how the old noise shaping filter of Speex used to look like (see the first frequency/amplitude plot). People who are aware of how the outcome of a modern psychoacoustic model looks like won't have any doubts that this can be improved.

Funny thing also is the resemblance of scheme they came up with to the noise shaping algorithm I used here (noise shaping filter: denominator polynomial = LPC analysis filter polynomial, nominator polynomial = fixed noise shaper that additionally moves some noise from lower to higher frequency bands which I derived by feeding Vorbis with white noise). I didn't describe it in the thread so you could accuse me of lying but I remember telling bryant about it.

Anyhow, the fixed nominator I used seemed to work fine for me. Such a filter could be further optimized for speech. Advantage over the proposed scheme by jmvalin and monty: EVEN SIMPLER encoder implementation. Disadvatage: It might be of lower quality -- But my instinct tells me the difference in quality should be marginal.


Sebi
bryant
QUOTE(SebastianG @ Apr 14 2006, 10:03 AM) *

Funny thing also is the resemblance of scheme they came up with to the noise shaping algorithm I used here (noise shaping filter: denominator polynomial = LPC analysis filter polynomial, nominator polynomial = fixed noise shaper that additionally moves some noise from lower to higher frequency bands which I derived by feeding Vorbis with white noise). I didn't describe it in the thread so you could accuse me of lying but I remember telling bryant about it.

Nobody can accuse you of lying. You told me about it here. smile.gif

And I'm still interested in trying it out one day to see how much of a perceptual improvement it (or something similar) could make to WavPack hybrid/lossy mode. If the bitrate for general transparency could be moved from around 300 kbps to around 200 kbps, it would certainly be nice.
Firon
It'd be great if you could drop the bitrare required for transparency significantly; it'd make it even more practical for use in portable devices!
jmvalin
QUOTE(SebastianG @ Apr 15 2006, 03:03 AM) *

For starters I was quite surprised about how the old noise shaping filter of Speex used to look like (see the first frequency/amplitude plot). People who are aware of how the outcome of a modern psychoacoustic model looks like won't have any doubts that this can be improved.

Funny thing also is the resemblance of scheme they came up with to the noise shaping algorithm I used here (noise shaping filter: denominator polynomial = LPC analysis filter polynomial, nominator polynomial = fixed noise shaper that additionally moves some noise from lower to higher frequency bands which I derived by feeding Vorbis with white noise). I didn't describe it in the thread so you could accuse me of lying but I remember telling bryant about it.


Actually neither the Speex noise shaping, nor "your" noise shaping are new. The general idea of using the LPC filter for noise shaping dates from *at least* 1984 with the first CELP codec.

QUOTE(SebastianG @ Apr 15 2006, 03:03 AM) *

Anyhow, the fixed nominator I used seemed to work fine for me. Such a filter could be further optimized for speech. Advantage over the proposed scheme by jmvalin and monty: EVEN SIMPLER encoder implementation. Disadvatage: It might be of lower quality -- But my instinct tells me the difference in quality should be marginal.


Advantage: none over the current (non-Vorbis-psy) Speex model. BTW, Speex still uses the "old" model by default and the "new" one is still an experimental option for the float version only.
HotshotGG
Interesting. I just skimmed through the paper I will have to read it when I have more time. wink.gif
SebastianG
QUOTE(jmvalin @ Apr 16 2006, 02:38 PM) *

Actually neither the Speex noise shaping, nor "your" noise shaping are new. The general idea of using the LPC filter for noise shaping dates from *at least* 1984 with the first CELP codec.

Just saying ... I recognized the similarity which I considered to be notable. Why notable? Because of the mentioned advantage/disadvantage. That's all. No evil implication.

QUOTE(jmvalin @ Apr 16 2006, 02:38 PM) *

Advantage: none over the current (non-Vorbis-psy) Speex model.

Excuse me? If I'm not mistaken the nominator of "your" noise shaping filter has pretty much the same effect as the static filter I used/described (shifting noise upwards) according to one of your plots. This is certainly what the Vorbis psy model usually suggests (higher SNR for lower frequency regions, lower SNR for upper regions). Since the denominator alone shapes the noise similarly to the original signal, the nominator controls the SNR (more or less independant from the signal's coloration).

You seem to be quite sure about my assumptions not being correct.
Can you elaborate? (Reasoning)


Sebi
jmvalin
Excuse me? If I'm not mistaken the nominator of "your" noise shaping filter has pretty much the same effect as the static filter I used/described (shifting noise upwards) according to one of your plots. This is certainly what the Vorbis psy model usually suggests (higher SNR for lower frequency regions, lower SNR for upper regions). Since the denominator alone shapes the noise similarly to the original signal, the nominator controls the SNR (more or less independant from the signal's coloration).

You seem to be quite sure about my assumptions not being correct.
Can you elaborate? (Reasoning)
[/quote]

Have a closer look at the paper (equation 1). Both the numerator *and* denominator of the Speex (and most other CELP codecs) noise weighting depend on the signal's LPC coefs. Also, none of the numerator/denominator is the LPC filter itself. Using the LPC filter + a static filter as weighting would result in something a bit similar to RELP, which has much poorer quality than CELP.
SebastianG
QUOTE(jmvalin @ Apr 17 2006, 04:24 AM) *

Have a closer look at the paper (equation 1). Both the numerator *and* denominator of the Speex (and most other CELP codecs) noise weighting depend on the signal's LPC coefs. Also, none of the numerator/denominator is the LPC filter itself.

I'm very well aware of it and I didn't say anything else. Edit: Oh, I'm sorry, I possibly wasn't clear enough. I was comparing it to your new approach where you are using the all pole LPC synthesis filter in combination with another FIR filter.

QUOTE(jmvalin @ Apr 17 2006, 04:24 AM) *

Using the LPC filter + a static filter as weighting would result in something a bit similar to RELP, which has much poorer quality than CELP.

... which is possibly due to only coding the lower spectrum part and regenerating the higher part "by the perturbed spectral folding method" (I've no idea what this is about but it sounds suspectible to me).

So, I guess this is opinion against opinion. Without any actual tests we're going in circles.


Sebi
jmvalin
QUOTE(SebastianG @ Apr 18 2006, 07:29 PM) *

I'm very well aware of it and I didn't say anything else. Edit: Oh, I'm sorry, I possibly wasn't clear enough. I was comparing it to your new approach where you are using the all pole LPC synthesis filter in combination with another FIR filter.


What you were describing actually sounds closer to the "old" pole-zero model derived from the analysis LPC coefs. The "new" model is not based on the analysis LPC coefs. It's basically a pole-zero representation (transformation) of the Vorbis masking curve.

QUOTE(SebastianG @ Apr 18 2006, 07:29 PM) *

... which is possibly due to only coding the lower spectrum part and regenerating the higher part "by the perturbed spectral folding method" (I've no idea what this is about but it sounds suspectible to me).

So, I guess this is opinion against opinion. Without any actual tests we're going in circles.


The spectral folding part is indeed one reason RELP sucked, but not the only one. I *have* done testing with LPC+static and a few variants like that when designing Speex. The current (old) filter is still much better than those. The "new" filter is much more complex, which is why it's not enabled by default. BTW, if you think you can easily beat the current "old" model, I'd be more than happy to be proven wrong. It's actually very simple (and backward-compatible) to change the weighting filter in Speex.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.