Help - Search - Members - Calendar
Full Version: Some feeling about Parametric Stereo
Hydrogenaudio Forums > Lossy Audio Compression > AAC > AAC - General
optimus
After some testing, I don't think Parametric Stereo is as good as I expected. I think it's not really a new thing, just the same tech as mp3PRO's LC-stereo, that is, the lower frequency part of the sound is mono (encoded with AAC), and the higher frequency part is a kind of stereo (encoded with SBR).
Garf
I don't know of any other way to say, so: You are really completely wrong, both on the technical and the performance part.

If you had done any serious testing you would not be making this post, which BTW is a direct TOS 8 violation. It is pretty trivial to demonstrate the superiority of PS on <=32kbps samples, likewise it is pretty trivial to demonstrate that it can provide stereo imaging even at low frequencies below the SBR cut-off.
optimus
Please give me more detail to let me know that I was completely wrong. Actually I hope I was. I hope PS can benefit all of us. Anyway, my "test" is only my subjective feeling, based on my ear. The sound is crisp, but not making me feel so stereo, especially when I used PS to encode a karaoke audio track, that is, one channel contains vocal and the other doesn't, the result is both channels contain vocal.
kjoonlee
Could you provide a short sample of the original track? Lossless compression would help.
optimus
Let's see the difference. This test clip is even not the one with the most serious effect but still obvious enough. All the AACs are encoded with Winamp's AAC+V2 encoder.

The original sound clip (wav file):
http://base.3322.org/ftp/misc/testkaraoke.rar
(The left channel of the clip is music without vocal sound and the right channel is music plus vocal sound)

Encoded with AAC+V2 with PS @ 48kbps:
http://base.3322.org/ftp/misc/testkaraoke-48kbps-ps.aac
(The left channel has very noticable vocal sound)

Encoded with AAC+V2 without PS @ 48kbps:
http://base.3322.org/ftp/misc/testkaraoke-48kbps.aac
(The left channel almost has no vocal sound)

Encoded with AAC+V2 without PS @ 64kbps:
http://base.3322.org/ftp/misc/testkaraoke-64kbps.aac
(The left channel has no vocal sound just like the original)

Even at the same 48kbps, the one without PS has more seperate channels than the one with PS, and at 64kbps, it's clearly seperated.
Garf
PS is not meant to be used a high bitrates (and 48kbps is probably already over the breaking point).
optimus
So I guess, PS is only a kind of fake stereo, just make people feel it is like stereo (by generating different high frequencies on two channels maybe).

QUOTE(Garf @ Sep 14 2005, 11:49 PM)
PS is not meant to be used a high bitrates (and 48kbps is probably already over the breaking point).
*


Garf
Again, not even close tongue.gif It has nothing to do with high frequencies only, I don't know where you get that BS.

It can't always generate a complete stereo image, but it's not meant for that. It's meant to be used when parametrically reconstructing it is more efficient (or will sound better) than normal stereo coding. At low bitrares this is very often true.

Also, don't mistake the flaws of 1 implementation with flaws of the design.
Mike Giacomelli
QUOTE(optimus @ Sep 14 2005, 09:00 AM)
So I guess, PS is only a kind of fake stereo, just make people feel it is like stereo (by generating different high frequencies on two channels maybe).

QUOTE(Garf @ Sep 14 2005, 11:49 PM)
PS is not meant to be used a high bitrates (and 48kbps is probably already over the breaking point).
*


*



What is this post supposed to mean? How does one fake stereo by having two channels? What would real stereo be?
optimus
This is the explanation of Parametric Stereo encoding procedure I took from CT's site.

QUOTE
The Parametric Stereo encoder extracts a parametric representation of the stereo image of an audio signal, whereas only a monaural representation of the original signal is encoded in a conventional fashion. The stereo image information is represented as a small amount of high quality parametric stereo information and transmitted along with the monaural signal in the bit stream. Based on the parametric stereo information, the decoder is capable of regenerating the stereo image.

user posted image


I don't think it's wrong to regard PS as mono lower freq. part + stereo higher freq part.
optimus
Two-channel does not mean stereo. If two channels have the same signal, can it be called stereo? I think real stereo needs to be stereo at all frequencies.

QUOTE(Mike Giacomelli @ Sep 15 2005, 08:00 AM)
What is this post supposed to mean?  How does one fake stereo by having two channels?  What would real stereo be?
Dibrom
QUOTE(optimus @ Sep 14 2005, 08:19 PM)
This is the explanation of Parametric Stereo encoding procedure I took from CT's site.

QUOTE
The Parametric Stereo encoder extracts a parametric representation of the stereo image of an audio signal, whereas only a monaural representation of the original signal is encoded in a conventional fashion. The stereo image information is represented as a small amount of high quality parametric stereo information and transmitted along with the monaural signal in the bit stream. Based on the parametric stereo information, the decoder is capable of regenerating the stereo image.

user posted image


I don't think it's wrong to regard PS as mono lower freq. part + stereo higher freq part.
*



Based on what?

The quotation doesn't even include the word frequency in any part of it. All it says is that the signal is coded in a conventional monaural fashion, with a parametric representation of the stereo image transmitted along with it. It says nothing about fundamental frequency discrimination.

And what's more, that quotation is an extremely oversimplified explanation of a much more complex process. If you want to know how PS really works, you'd probably need to read some papers on the topic or look at the source code for an encoder that implements it.

Garf has almost assuredly done one and probably both, so I'm inclined to take his word on the matter smile.gif
optimus
Isn't a HE-AAC-PS stream combined of 22khz mono AAC-LC + SBR + PS? If I play a PS stream with decoders that do not support HE-AAC, all of them will report it as a mono 22khz one. I think that is just the conventional fashion. Am I wrong with this?

btw: I will surely go and read some paper on PS as u suggested.
Garf
The reason for that is that PS needs the same kind of filterbank as SBR does, and the PS data is transmitted "after" the SBR data. If the decoder doesn't do SBR processing, he won't have the filtered data nor the parameters available to do PS.

As I said only 5 times already, PS is not limited to high frequencies at all.

I haven't seen much papers about PS directly, but you can find many about spatial cue coding which is based on very similar principles.
Defsac
QUOTE(optimus @ Sep 14 2005, 10:43 PM)
After some testing, I don't think Parametric Stereo is as good as I expected. I think it's not really a new thing, just the same tech as mp3PRO's LC-stereo, that is, the lower frequency part of the sound is mono (encoded with AAC), and the higher frequency part is a kind of stereo (encoded with SBR).
SBR has nothing to do with channel separation. It analyses lower frequency information and then uses a specialised noise generator to reconstruct the higher frequency information based on certain traits of the information available. It can't "encode" stereo information, although it can use existing side information in it's lower frequency analysis.
optimus
I never said SBR is related with stereo. SBR is great, since mp3PRO.

QUOTE(Defsac @ Sep 15 2005, 05:24 PM)
SBR has nothing to do with channel separation. It analyses lower frequency information and then uses a specialised noise generator to reconstruct the higher frequency information based on certain traits of the information available. It can't "encode" stereo information, although it can use existing side information in it's lower frequency analysis.
*


Garf
QUOTE(optimus @ Sep 15 2005, 01:57 PM)
I never said SBR is related with stereo.


Really? Do you understand what SBR is and does?

QUOTE(optimus)
the higher frequency part is a kind of stereo (encoded with SBR).


QUOTE(optimus)
(by generating different high frequencies on two channels maybe).


QUOTE(optimus)
I don't think it's wrong to regard PS as mono lower freq. part + stereo higher freq part.


QUOTE(optimus)
I think real stereo needs to be stereo at all frequencies.


We've corrected your misinformation for only like 4 or 5 times.
IgorC
Don't waste your time to explain something to somebody.
Just create a great Nero HE-AAC2 audio codec. smile.gif
Defsac
QUOTE(optimus @ Sep 15 2005, 09:57 PM)
I never said SBR is related with stereo. SBR is great, since mp3PRO.
Garf has already posted the quotes I was going to reference. From these it seems very much like you believe there is some correlation between SBR and stereo seperation.
optimus
There is an old Chinese saying, reads as "duan zhang qu yi", meaning taking part of others words to use it as the whole meaning! You are doing this now.
SBR is not related with stereo, as mono audio can be improved with SBR. SBR is used to expanding frequency domain of the sound, by generating, instead of restoring faithfully, the high frequencies. PS is the side information that is used to re-creates stereo image from the mono audio which is encoded in the "conventional fashion" (in this case, it's AAC LC at 22khz mono, and maybe including the enhancement of SBR). But PS DOES NOT re-create the full stereo image, as what I was originally expected, just the higher frequency part, as my test shows! You guys should try to digest others words more deeply before using words like 'completely' or 'xxx times' again and again. I was already trying to avoid using very affirmative words, and speak as pleasantly as I can, but you guys do not. Can we discuss something instead of offending others?
Gabriel
QUOTE
But PS DOES NOT re-create the full stereo image, as what I was originally expected, just the higher frequency part, as my test shows!

Your test does not show that PS only reproduce higher freqs in stereo.
Your test just means that you found a testcase that is not handled in a good way by the encoder you tested.

*How do you know that in your testcase the sound stage was properly reproduced in the high freqs?
*If we find even a single case of SBR+PS encoding where the low freq sound stage is preserved, it means that PS also reproduce the sound stage of the low freqs. There are plenty examples of this case.
optimus
As I know, vocal sound are around 1-2khz, that's at the lower frequency I mentioned. The unexpected vocal sound at the left channel should be blamed on whom? It should be blamed on nobody but the one who is responsible for restoring stereo image, that's PS.

QUOTE(Gabriel @ Sep 16 2005, 12:30 AM)
QUOTE
But PS DOES NOT re-create the full stereo image, as what I was originally expected, just the higher frequency part, as my test shows!

Your test does not show that PS only reproduce higher freqs in stereo.
Your test just means that you found a testcase that is not handled in a good way by the encoder you tested.

*How do you know that in your testcase the sound stage was properly reproduced in the high freqs?
*If we find even a single case of SBR+PS encoding where the low freq sound stage is preserved, it means that PS also reproduce the sound stage of the low freqs. There are plenty examples of this case.
*


Gabriel
Are you trying to tell us that it can sometimes fail or that it always fails?
It seems to me that you demonstrated that it sometimes fails but are claiming that it always fails.
slippyC
I don't know if this has any bearing, but anyway.

Someone else in another thread was talking about problems as well, with stereo. Did you by any chance use Foobar for your listening? If so, I believe the dev of Foobar said it was a problem with FAAD(or whatever that opensource decoder is) and he couldn't do much about it(since it was on their end).

I'm sure someone will jump in and correct me if I'm wrong(also I believe it was on higher versions of Foobar, but don't quote me).
Garf
QUOTE(slippyC @ Sep 15 2005, 08:46 PM)
I don't know if this has any bearing, but anyway.

Someone else in another thread was talking about problems as well, with stereo.  Did you by any chance use Foobar for your listening?  If so, I believe the dev of Foobar said it was a problem with FAAD(or whatever that opensource decoder is) and he couldn't do much about it(since it was on their end).

I'm sure someone will jump in and correct me if I'm wrong(also I believe it was on higher versions of Foobar, but don't quote me).
*



foobar 0.9 beta8 triggered a bug related to this. But there would have been no stereo at all if it had been active.

The bug was actually in FAAD2 and was fixed, but foobar was the only application that was affected by it.

All earlier and all later versions handle PS decoding perfectly fine.
Garf
QUOTE(optimus @ Sep 15 2005, 06:16 PM)
But PS DOES NOT re-create the full stereo image, as what I was originally expected, just the higher frequency part, as my test shows!
*



I didn't see any "test".

Here's a test: http://sjeng.org/ftp/work/pstest.mp4

Gee, what's that LF sound doing in the left channel?
Ivan Dimkovic
Actually, to add a bit to the confusion - SBR itself has tools for reducing of the stereo payload @SBR frequencies, and this tool is called "Coupling Tool".

In SBR, it means that we transmit very low amount of data for the right channel (ballance or "pan" parameter for envelopes and noise, quantized to a 3 dB resolution. Other bitstream parameters are shared between channels).

This mode is usually used for v1 bitstreams below some bit rate (for 44.1 kHz it is 56 kbps).

As for HE-AAC v2 (Paramatric Stereo) - entrire codec operates on mono stream, except the PS tool that extracts the stereo information - and stores it into the bitstream. Remaining tools (SBR encoder, LC encoder) work as it would be a typical mono file.

AAC decoder then decodes the LC and SBR mono signal - and feed that data, along with the PS bitstream payload to the PS decoder, that reconstructs the stereo image by using following parameters available for all frequency bands - in AAC with PS, you can have 10, 20 or 34 bands over a frequency range, and up to 4 envelopes in the time domain.

Parameters are:

IID - interchannel intensity difference
ICC - interchannel coherence

And, optionally following two sets of parameters for lower frequencies (< 2 kHz):

IPD - interchannel phase difference
OPD - overall phase difference

Typically, at lower bit rates only IID/ICC parameters will be transmitted.

With all these information available - it is possible to reconstruct very faithful stereo image.
Garf
Thanks for the detailed info Ivan. You are much more patient than I am smile.gif

PS. Note that lower frequencies (can) have a more detailed parametric description than higher ones.
optimus
This is what I really want to get from the discussion, very subjective and detailed explanation. Thanks for all the information.

QUOTE(Ivan Dimkovic @ Sep 16 2005, 06:02 AM)
Actually, to add a bit to the confusion - SBR itself has tools for reducing of the stereo payload @SBR frequencies, and this tool is called "Coupling Tool".

In SBR, it means that we transmit very low amount of data for the right channel (ballance or "pan" parameter for envelopes and noise, quantized to a 3 dB resolution.  Other bitstream parameters are shared between channels).

This mode is usually used for v1 bitstreams below some bit rate (for 44.1 kHz it is 56 kbps).

As for HE-AAC v2 (Paramatric Stereo) - entrire codec operates on mono stream, except the PS tool that extracts the stereo information  - and stores it into the bitstream.  Remaining tools (SBR encoder, LC encoder) work as it would be a typical mono file.


optimus
And another question, why does PS not suitable or efficient for higher bitrates?
kjoonlee
QUOTE(optimus @ Sep 16 2005, 10:55 AM)
This is what I really want to get from the discussion, very subjective and detailed explanation. Thanks for all the information.
*

Well, don't you think saying "PS is bad" is not a very good way of asking how PS works? Also, if you wanted technical details, why didn't you post in the AAC - Tech forum?

edit:proper quoting
Dibrom
QUOTE(optimus @ Sep 15 2005, 06:11 PM)
And another question, why does PS not suitable or efficient for higher bitrates?
*



This should be pretty simple to understand I think.

Because PS is storing the information to describe the behavior of the signal (in this case the stereo image related portion) parametrically, it necessarily is going to be more course grained than the normal method.

To use an analogy, think of something like this: Say there was some event that happened somewhere, and you would like to know more information about it. You have two possible ways of getting that information: 1) you can ask your friend, who was at that event, to give you a quick description, or 2) you can watch a videotape of the event itself.

Obviously #1 will give you a pretty good idea of what happened, but it won't be a completely faithful portrayal of things -- some information will be incorrect and there's a lot of information simply missing, so you don't have the full context. But for a quick, simple description, that's all pretty much irrelevant. #2 will give you a much better idea of what happened, but if you're short on time, you may not be able to watch all of it just to get the idea you want -- in this case you don't have the resources (i.e., time or bandwidth) to deal with all of the information.

So #1 is like PS, and #2 is like non-PS. In cases where you just want something that's "good enough" (i.e., better than nothing -- mono), you use PS. In cases where you aren't limited in resources as much (i.e., you have more bandwidth available), the courser grained approximation afforded by #1 and PS is undesirable, because no matter how well tuned it is, it very likely will never reach the level of fidelity and accuracy of #2 and non-PS.

By the way, this situation should be the same for any sort of parametric techniques, just like SBR. You can always employ more tools and techniques to use in combination to make up for some of the shortcomings though. But eventually there's going to be a lower limit on the amount of information that simply has to be transmitted to have the signal.
optimus
QUOTE(kjoonlee @ Sep 16 2005, 11:02 AM)
Well, don't you think saying "PS is bad" is not a very good way of asking how PS works? Also, if you wanted technical details, why didn't you post in the AAC - Tech forum?
*



Maybe as english is not my mother language, I was not expressing myself very properly. Acutally I wanted to express that I was a little bit disappointed with PS, at least it makes me disappointed at high bitrate like 48Kbps. I didn't say it's BAD. It's just not what I need in my case (the case that stereo and compression are both very important).
The reason why I didn't post this in the tech forum is that I think I don't really know about the detail of this tech and I was not ready to talk about this tech at that time. But with the discussion going on, I think I can only be persuaded by the detailed techical explanation, instead of the manner serveral guys were speaking with.

Dibrom
@optimus

When you quote someone's text, can you please post your response underneath the quote? This is the convention for discussion groups as it allows the discussion to stay sane for long threads and nested quotations. Top posting tends to wreak havoc...
optimus
QUOTE(Dibrom @ Sep 16 2005, 11:57 AM)
To use an analogy, think of something like this:  Say there was some event that happened somewhere, and you would like to know more information about it.  You have two possible ways of getting that information: 1) you can ask your friend, who was at that event, to give you a quick description, or 2) you can watch a videotape of the event itself.

Thanks, I like your analogy!
So up to now, there is still no new tech that can bring us compression improvement while not reducing the fidety and faithfulness of the sound since MP3? That's maybe the reason why MP3 is still so popular, isn't it?
optimus
QUOTE(Dibrom @ Sep 16 2005, 12:49 PM)
@optimus

When you quote someone's text, can you please post your response underneath the quote?  This is the convention for discussion groups as it allows the discussion to stay sane for long threads and nested quotations.  Top posting tends to wreak havoc...
*



Sorry for that. I will do this in future.
kjoonlee
QUOTE(optimus @ Sep 16 2005, 10:55 AM)
This is what I really want to get from the discussion, very subjective and detailed explanation. Thanks for all the information.
*

QUOTE(optimus @ Sep 16 2005, 01:35 PM)
Acutally I wanted to express that I was a little bit disappointed with PS, at least it makes me disappointed at high bitrate like 48Kbps.
*

I thought what you wanted was an explanation, not to express your feelings. Well, saying you're a bit disappointment with PS isn't a very good way of asking how PS works either!

edit: spelling
optimus
QUOTE(kjoonlee @ Sep 16 2005, 12:55 PM)
QUOTE(optimus @ Sep 16 2005, 01:35 PM)
Acutally I wanted to express that I was a little bit disappointed with PS, at least it makes me disappointed at high bitrate like 48Kbps.
*

I thought what you wanted was an explanation, not to express your feelings. Well, saying you're a bit disappointment with PS isn't a very good way of asking how PS works either!

edit: spelling
*



Actually I was not going to ask for how PS works at the beginning. I just wanted to know if there is any other people have the same feeling as mine.
Dibrom
QUOTE(optimus @ Sep 15 2005, 08:52 PM)
So up to now, there is still no new tech that can bring us compression improvement while not reducing the fidety and faithfulness of the sound? That's maybe the reason why MP3 is still so popular, isn't it?
*



No, that's not what I mean. PS, SBR, PNS, and pretty much most of the tools that have been added to AAC over it's development timespan all improve quality to some degree (at least the tools that are designed to do that, as opposed to some other function). But they are improving quality in other areas than the high bitrate realm. For most purposes, people seem to feel that 128kbps+ is already "good enough" for most codecs. Now they want to get high quality at 24kbps or so, so development is less focused on the high bitrate range.

And just about every competing lossy codec that has come out after MP3 has been technology superior on many levels. But for most people, MP3 is already "good enough," at least at 128kbps or higher.

What I meant to imply about parametric versus non-parametric methods is that parametric methods are lower quality than non-parametric methods because they are a different class of techniques.

The traditional approach is analytical -- a signal is examined and undergoes a space reduction as a result of an analytical transformation. After the psymodel indicates what information is expendable, the rest is preserved as well as possible through this analytic transformation -- the information itself does not fundamentally change much, it now just exists in a form which allows for a more compact expression.

A parametric approach is synthetic -- a signal is examined for certain fundamental characteristics, and only the description of these characteristics is saved out of the total information. Later, these characteristics (which have now become parameterized) are recombined with a set of general information about types of signals (this will be in the form of various algorithms and formulas that make up a kind of synthesizer for a specific domain) that can be used to synthesize an approximation of the original form.

The latter approach is far less accurate, but can also be expressed with far less information than the former. For some cases where resources are limited (extremely low bitrate), and when such a technique is employed properly, this can be a good thing. But for cases where resources are not limited, it actually becomes a bad thing. This is why you don't see SBR and PS recommended for high bitrates.

Edit: Actually, the answer to your question also depends on whether you mean subjective (i.e., perceptual) "fidelity and faithfulness" or the same thing objectively. Any lossy codec is going to be focused on the former -- the latter is for lossless codecs. In the lossless realm, there haven't been any real significant breakthroughs in recent times that I'm aware of. This is because the problem is a lot more difficult -- it becomes a game of hardcore mathematics relating to information and coding theory and stuff like that. This kind of stuff is traditionally very abstract and quite difficult. If someone where to make a fundamental breakthrough in this realm, they'd probably become famous almost instantly (that is, unless some government agency got ahold of them first smile.gif).

The realm of lossy codecs has a lot more to work with, since our understanding of human perception is still improving pretty rapidly and, as is obvious by work in lossy codecs at low bitrates over the years, there is probably still a lot of headroom (relatively) to exploit before we reach a limit. A lossy coding scheme has a lot more options to exercise in reducing space while keeping quality perceptually similar.
optimus
You explanation is just like textbook. biggrin.gif Really learned something from it.
optimus
When AAC comes out, I don't really feel it has very big advantages over MP3 and thus didn't care about it. When mp3PRO comes out, I started to rip CD with this new format. The high frequency response of mp3PRO makes my ears more excited than with MP3. Then I started to like the feeling of SBR. When HE-AAC comes out, AAC and SBR go together. And yet the new standard is more open than mp3PRO. All these gave me some pleasant surprise. This time, when I first heard Parametric Stereo, I also expected very much of it. However, more expections sometimes bring more disappointment. Of couse, this is limited to my case. I am still using Parametric Stereo to re-compress many of my video clips, in which audio quality is not so important. Comparing to the original 224Kbps MPEG audio or 128K MP3 audio tracks, the audio track encoded with PS @ 16Kbps really saves lots of bytes. My looking-forward now is to see HE-AAC-PS comes with VBR, as life videos, there are more silence period than music videos.
Ivan Dimkovic
The aim of parametric stereo is to improve HE-AAC performance at low bit rates.

And, it has fullfiled that demand - compare this with HE-AAC's goal - to improve the quality at bit rates up to 96 kbps (I am talking about 44100 Hz, Stereo) - above this threshold, LC-AAC starts to be better, and, really, above 128 kbps - LC-AAC will have better performance than HE-AAC:

- Better pre-echo protection

- Possibility to code complex harmonic structures where SBR is incapable of doing so (i.e. SBR can add only one harmonic per coding band, etc...)

But, at 96 kbps and below, SBR powered AAC is more powerful than LC-AAC, thanks to the huge bit rate reduction due to parametric coding of the high-frequency spectrum by the SBR tool.

Now, same story goes for Parametric Stereo - its aim is to improve HE-AAC performance this time, at bit rates of ~40-48 kbps and below (depending on the case) - above this bit rate, HE-AAC is powerful enough to code spectrum better, because of direct stereo coding - some signals might be coded better, that would be otherwise impossible to encode with PS.

But, at 40-48 kbps and below, PS is again - more powerful, in the same way HE-AAC is below 96 kbps, because it provides huge bit rate savings.

So, the bottom line - right tool (or, better, set of tools) for the right bitrate wink.gif

QUOTE
My looking-forward now is to see HE-AAC-PS comes with VBR, as life videos, there are more silence period than music videos.


There might be a case for HE-AAC v2 VBR, with parametric stereo transmitting phase information as well as updating at relatively high rate (thus requiring 6-8 kbps instead of 2-3 for the basic IID/ICC updated once or twice per frame) - we will definitely test this approach and see if it has some use.
slippyC
So Ivan or Garf, are you guys going to release a version of Nero that supports HE-AAC and the v2 variety with VBR support?

Is this something that is planned in the NEAR future or something to look forward to down the road? Not meaning soon 1 1/2 years, either.. wink.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.