Help - Search - Members - Calendar
Full Version: Another Joint Stereo Discussion
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
Pages: 1, 2, 3
ChiGung
edit: oops posted same thing twice
ChiGung
QUOTE(phong @ Mar 4 2005, 05:41 PM)
QUOTE
Hmmm it sounds like you arent overly concerned with the purity of the S channel -starving it or merely subjecting it to filtration designed for real channels, this could add fire the pure stereo enthusiasts~

You make it sound as if the S channel is somehow much more important than the M channel when it comes to contributing to the stereo image (or overall audio quality.) I don't think that's a reasonable conclusion. Take for example filters in audio players that give you "wider stereo". They do so by (effectively) making the M channel more quiet. Likewise, since the L/R representation contains the same information as the M/S representation, damaging the L/R channels damages the overall audio just as much as damaging the S channel does.
*



I see the M channel as basicaly mono, normal sound, the two channels equaly combined.
The two channels Ro Lo , are my ears, they can hear Ro with Lo independantly or M together.
In the middle is some wiring that interprates, as well as Ro,Lo or M, the S channel.
The S channel can induce dizzying sensations, and ideas about distance, physicality etc, but its not interprated in the same way as Ro Lo or M, because its not a noise, its the difference between simulatenous noise. If we could do a psychoacoustic study of S somehow, I expect it to differ fundamentaly from M|L|R signal, it could be trickier or easier to fool, but it would only be fooled by similar parameters by pure coincidence because its a different thing from a percieved noise.
Lames -m s stereo doesnt deal with separation, so it looses the opportunity to economise Lo and Ro for cross correlation, but avoids the danger of changing the cross correlation in way which is consistent with percieving sound but not percieving difference in channels of sound. Id worry about that before it was comfirmed by testing that whats being done to S is ok.
At the moment the switching mechanism discerns if bits will be saved by going M/S with both M and S treated as sound. If a stereo mode could still switch but only when bits are saved when M is treated as sound and S as a signal that wants to be at least as unharmed as it gets by uncorrelated distortion in -m s mode. That mode perhaps could be said to be practicaly and theoreticaly better than -m s for all requirements~

edit: -Not that it would be better than existing -m j -for all requirements.
Pio2001
QUOTE(Gabriel @ Mar 4 2005, 11:49 AM)
QUOTE
It sounds as though the side channel is treated as normal. Are the same psychoacoustics used for the S channel that would be used on L&R, Mid -even temporal masking ?

In Lame V3, yes.
*



Er...

It seems absurd.
If the ATH at a given frequency is, say -70 db from full scale, we can't assume that because of this, the smallest perceptible difference between left and right at this frequency should be also 70 db below full scale... especially if the signal is very weak.
For example, 12 db above the ATH (4 times the amplitude), Lame assumes that the brain can't differentiate a mono signal (100% left, 100% right) from a signal with 75 % left and 100 % right ? (because the S channel is below the ATH). Is there some sense in this ?
pest
something which came in to my mind...

if you transform the stereo-channels lossless but
quantizise them afterwards the decoded waves should be different

i don't know how good the psychoaccoustic-model adapts to
noise (the differnce channel) but there is some room for possible damage

Gabriel
*Every time you do something in a lossy encoder, it could potentially lead to degradations if you are not carefull. If you do not want that risk, then do not use any lossy encoder.

*Same psychoacoustics are basically applied to L/R/M/S channels. This does not means that the internal thresholds are the same, just that the overall logic is the same.
ChiGung
QUOTE(Gabriel @ Mar 5 2005, 04:11 PM)
*Same psychoacoustics are basically applied to L/R/M/S channels. This does not means that the internal thresholds are the same, just that the overall logic is the same.
*


I suppose you must wonder: 'why dont they just go read the source?'
-maybe I should know better to think such an idea might have been overlooked.
Id like to play with Lame myself, but ive got a bit to go with c++ before that.
pest
QUOTE(Gabriel @ Mar 5 2005, 08:11 AM)
*Every time you do something in a lossy encoder, it could potentially lead to degradations if you are not carefull. If you do not want that risk, then do not use any lossy encoder.


absolutly clear, i try to say it in other words...through i did not checked the source

i know that m/s is lossless but it's imo senseless to apply
the same model to different types of data...thats some
sort of unexpected behavior because lame is mostly tuned
to music...there's nothing 'wrong' with stereo encoding
but the output of

wav->lossy->wav

and

wav->lossless->lossy->lossless->wav

is different

for example, if you predict a subband lossless and quant it
afterwards...dequant...you really have to take care about
the quantization noise to not drive the lossless decoding nuts


edit1: did i just explained your answer? dry.gif
edit2: spelling crying.gif
ChiGung
QUOTE(pest @ Mar 6 2005, 08:24 PM)
I know that m/s is lossless but it's imo senseless to apply
the same model to different types of data...
*


It turns out that its not the same model pest. ~Just the same 'vr engine' (metaphoricaly ; )
Your explaination sounds ok to me, but the english ~might be a bit tough with this subject - I find it difficult to be as precise as neccessary. Good for you trying. Maybe we have no purpose here without an interest in the source - see you there someday wink.gif

edit: added ambiguity
pest
QUOTE(ChiGung)
~might be a bit tough with this subject


i apologize for that sad.gif
ChiGung
nnnnaaah! Churchmouse : )- I have no probs with it, and if I did that would be my problem not yours. I can see you have an understandering. Only thought I might warn you, I have my own communication problems.
3ngel
@cAPSLOCK
Well, CAPS, i heared your two files hyd-js.wav and hyd-s.wav, and i say :

THEY SOUND DIFFERENTS

But first of all, i say i've done the test with a Terratec DMX 24/96, Class A valve ampli and a very quiet environment. Moreover, i can say i can explain in details the differences. But back to the matter :

J/S VERSION SOUNDS MORE "LOSSY" THAN THE COUNTERPART L/R (I call simple stereo in that manner)

What are the differences?
Well, the differences is that a PART OF THE GUITAR RIFF underlaying the song is almost gone and ALL THE DRUM'S GHOST NOTES are cut at all (I can say it because i'm a drummer and i can hear the absence wink.gif

But this conclusion, is not a mystere, because from a *theorical* point of view, J/S is not only a way of trasforming the stereo image, but a kind of compressing them. Infact, J/S is a rappresentation of L/R channels in terms of SIMILARITY & DIFFERENCES between the channels. That is, in J/S on one channel is grouped a signal that is the result of the (kind of) AND of the 2 chans, while on the other is the (kind of) XOR of the channels.
But here goes the "trick" (and faulty of J/S in some situation) : what's happen if the algorithm says that 2 channels are equal (resulting in full MID and 0 SIDE), while they are not *entirely*? It condenses ALL the 2 chans (what it considers EQUAL) in the MID, and nothing on SIDE.
In the case of the 2 samples by @cAPSLOCK, it's evident in this scenario that J/S is faulty. It was not able to code properly the drum's and guitars' ghost notes on the SIDE properly, so they were lost.

With this post, i want to say that:
- From a *secure* point of view, and in a PRECISE HIGH QUALITY SCENARIO it could be safe to encode all in L/R so we are sure the J/S alghoritm can't be wrong (because it's not used), stripping for error some important SIDE data.
- In a NORMAL OR AVERAGE QUALITY SCENARIO, J/S can be acceptable and it's *kind of lossy* is trascurable.
Jebus
QUOTE(3ngel @ Jun 1 2005, 01:59 PM)
@cAPSLOCK
Well, CAPS, i heared your two files hyd-js.wav and hyd-s.wav, and i say :

THEY SOUND DIFFERENTS

J/S VERSION SOUNDS MORE "LOSSY" THAN THE COUNTERPART L/R (I call simple stereo in that manner)

What are the differences?
Well, the differences is that a PART OF THE GUITAR RIFF underlaying the song is almost gone and ALL THE DRUM'S GHOST NOTES are cut at all (I can say it because i'm a drummer and i can hear the absence wink.gif


Ugh, posts like this will get you banned, pal. You have to back things up with double-blind listening tests. If you don't like it, post elsewhere. We have a low B.S. tolerance level around here.
3ngel
QUOTE(Jebus @ Jun 1 2005, 11:11 PM)
Ugh, posts like this will get you banned, pal. You have to back things up with double-blind listening tests. If you don't like it, post elsewhere. We have a low B.S. tolerance level around here.
*



Well, well all right i don't want to be banned eheh
I'll do a double blind test smile.gif
3ngel
@Jebus
Is this sufficient? biggrin.gif

---------------

foo_abx v1.2 report
foobar2000 v0.8.3
2005/06/02 00:42:45

File A: file://Z:\Desktop\hyd-js.wav
File B: file://Z:\Desktop\hyd-s.wav

00:42:45 : Test started.
00:44:00 : 01/01 50.0%
00:44:24 : 02/02 25.0%
00:44:41 : 03/03 12.5%
00:44:59 : 04/04 6.3%
00:45:07 : 05/05 3.1%
00:45:28 : 06/06 1.6%
00:46:17 : 07/07 0.8%
00:46:28 : 08/08 0.4%
00:46:43 : 09/09 0.2%
00:46:57 : 10/10 0.1%
00:47:14 : 11/11 0.0%
00:47:27 : 12/12 0.0%
00:47:39 : 13/13 0.0%
00:47:56 : 14/14 0.0%
00:48:03 : 15/15 0.0%
00:48:11 : 16/16 0.0%
00:48:21 : 17/17 0.0%
00:48:35 : 18/18 0.0%
00:48:50 : 19/19 0.0%
00:49:03 : 20/20 0.0%
00:50:11 : Test finished.

----------
Total: 20/20 (0.0%)

user posted image
Ariakis
Could you upload a lossless version of the clip in the Uploads section, so members can confirm your results, and devs may examine the sample itself?
3ngel
QUOTE(Ariakis @ Jun 2 2005, 12:20 AM)
Could you upload a lossless version of the clip in the Uploads section, so members can confirm your results, and devs may examine the sample itself?
*


The samples .wav links are at the beginning of this thread posted by @cAPSLOCK
Ariakis
QUOTE(3ngel @ Jun 1 2005, 06:32 PM)
The samples .wav links are at the beginning of this thread posted by @cAPSLOCK
*



Haha, indeed. blink.gif Pardon my haste.
stephanV
Hearing the difference between the 2 samples is not so difficult. I used my laptop speakers for this.
QUOTE
foo_abx v1.2 report
foobar2000 v0.8.3
2005/06/02 12:39:18

File A: file://C:\Documents and Settings\Administrator\Desktop\hyd-js.wav
File B: file://C:\Documents and Settings\Administrator\Desktop\hyd-s.wav

12:39:18 : Test started.
12:40:51 : 01/01  50.0%
12:41:03 : 02/02  25.0%
12:41:10 : 03/03  12.5%
12:41:31 : 04/04  6.3%
12:41:54 : 05/05  3.1%
12:42:03 : 06/06  1.6%
12:42:08 : 07/07  0.8%
12:42:14 : 08/08  0.4%
12:42:18 : 09/09  0.2%
12:42:26 : 10/10  0.1%
12:42:43 : Test finished.

----------
Total: 10/10 (0.1%)


The question is if comparing just the S channels makes any sense. Probably not. the results on the start of the 2nd page are much more interesting, although admittedly cAPSLOCK said he did not have a preference for either one.
2Bdecided
It's well documetned that, when listening to the difference channel ("S" channel) in isolation, artefacts are clearly audible with lame aps and Musepack standard/q5. These artefacts are due to the starving of the S channel.

However, the question is, when listening to straight Left and Right (as normal people will do!), can you hear any artefacts?

If not, there is no problem. If it makes no audible difference to stereo playback, then it is perfectly sensible (even correct, given the aims of a lossy codec) to starve the "S" channel of bits.

Cheers,
David.
2Bdecided
QUOTE(ChiGung @ Mar 4 2005, 11:58 PM)
I see the M channel as basicaly mono, normal sound, the two channels equaly combined.
The two channels Ro Lo , are my ears, they can hear Ro with Lo independantly or M together.
In the middle is some wiring that interprates, as well as Ro,Lo or M, the S channel.
The S channel can induce dizzying sensations, and ideas about distance, physicality etc, but its not interprated in the same way as Ro Lo or M, because its not a noise, its the difference between simulatenous noise. If we could do a psychoacoustic study of S somehow, I expect it to differ fundamentaly from M|L|R signal, it could be trickier or easier to fool, but it would only be fooled by similar parameters by pure coincidence because its a different thing from a percieved noise.
Lames -m s stereo doesnt deal with separation, so it looses the opportunity to economise Lo and Ro for cross correlation, but avoids the danger of changing the cross correlation in way which is consistent with percieving sound but not percieving difference in channels of sound. Id worry about that before it was comfirmed by testing that whats being done to S is ok.
At the moment the switching mechanism discerns if bits will be saved by going M/S with both M and S treated as sound. If a stereo mode could still switch but only when bits are saved when M is treated as sound and S as a signal that wants to be at least as unharmed as it gets by uncorrelated distortion in -m s mode. That mode perhaps could be said to be practicaly and theoreticaly better than -m s for all requirements~

edit: -Not that it would be better than existing -m j -for all requirements.
*



That's a very interesting post, and carefully thought out.

However, where the "S" channel is large (i.e. there are significant differences between L and R - irrespective of what they are) you'll find most codecs simply switch to encoding the two channels separately - as L and R rather than M and S.

Binaural masking is slightly understood, as are the mechanisms which pick up on interaural differences. Most codecs largely ignore this stuff because, firstly the interaural difference signal is so completely different depending on whether you listen through headphones or speakers that it's hard to do anything meaningful that would account for both. Secondly, codecs work well enough without taking account of all this stuff, so, to be blunt, why bother?

Having said that, maybe there is some greater efficiency or quality waiting to be in lossy codecs by taking this into account.


It's part luck, and part design, that existing codecs don't wreck many of the cues used to determine space, depth, location. I say this because, when encoding mono, you could trash some of the phase information and still have a good sounding result, while trashing the phase information differently on two channels of a stereo track would ruin the sound. Luckily, even quite poor quality encodes keep interaural phase reasonably intact (unless you use intensity or parametric stereo!).

Cheers,
David.

P.S. Most pan potted "stereo" pop music has no real phase difference between the two stereo channels, though phase and timing differences appear at the ears when listening over loudspeakers.
3ngel
@2Bdecided
I agree with you that confronting only the two S side of a stereo & J/S encoding, can be questionable, because if S can be worse than L/R, this does not means that S + M is worse than L/R from a *perceptual* point of view. This because the mixing of S+M can compensate the lossy that are however present in the S side.
So i think that we can conclude that :

- For a super secure paranoid encoding it's better use always L/R stereo
- For all the other cases a dynamic (that is M/S + L/R) JS can be good smile.gif
ChiGung
Thanks 2Bdecided for the comment, its good to hear your insight.

QUOTE(3ngel @ Jun 2 2005, 03:12 PM)
- For a super secure paranoid encoding it's better use always L/R stereo

For 'enhanced precision stereo' I prefer the idea of weighing things in L/R's favour using -nsmsfix 1 rather forcing L/R which in effect just disables the option of M/S notation.

Thats assuming everything is taken care of -such as a 'quantisation continuity consideration' that occured to me, iirc some of the q# algorithms consider not just the distortion introduced for the frame, but how that distortion would contrast with the previous frame. That sort of noise discerment would be more difficult if the frames being related are different L/R & M/S, or it could be impractical anyway.

If everything is working smoothly though, i think --nsmsfix should be most ideal for spending extra bits on stereo.
3ngel
Well yes, obviosly we're talking about stereo question, and i see no point in weighing towards L/R in a J/S context. I think that in a J/S context the encoder does already a good job in selecting the right weighting. The goal of my "secure" and "paranoid" definition, is to blow away completely the lossyness introduced by J/S alghoritm in order to maintain as less "manipulation" as possibile on the signal.
So there's no point in "weighing". Or use dynamic J/S or don't use at all for complete stereo preservation. On the other side, naturally you can do all weighing you prefer, but you'll not reach (well we can say) "maximum quality" (in a given bitrate context). Because a perfect stereo preservation can contribute to the "clean sounding" of an mp3 and so to its overall quality.
ChiGung

Throughout a track, there should by simple chance at least, be some frames which the two channels will most accurately be described using M/S notation, not only in combination, but sometimes the stereo image will by chance of the system, be noted most accurately by an M/S phrase - effected by the relative difference between the channels but allso semi-random mathematical synchronicity % }
Independant stereo is simply dissallowing such preferable M/S notations when they could occur, but low nsmsfix j s stereo discourages M/S notation. If its working right, applied to the test samples starting this thread, nsmsfix should be able to produce an S track superior to the one produced by the independant stereo encode.

I have read developers advise nsmsfix for problems with enhanced stereo playblacks rather than -m s mode.
ChiGung
wibble thinkbubble musing:..
{
/*present discernment ~roughly,
Generate plausible L/R quantums (noting what size they huff to)
PsychoAc rates their transparency (ignores S)
Generate plausible M/S quantums
Rate their transparency (ignores LR) */

//alternative discernment,
Generate plausible L/R quantums
Rate their transparency each (ignores S)
Generate plausible M/S quantums
>Reconvert posed M/S quantums to equivalent L/R notations
Rate their transparency (same as L/R)
}
Using M/S notation, but no attempt to psychoanalyse as M/S, instead the L/R notation can be regenerated (in a form lamepsycho understands), and analysed same as straight L/R -only its L/R re-derived from M/S quantums.

Quantums rated as usual then, something like;
(noisyness-margin)*(noisyness)* (bitsRequired+bitResvrTweaker)/blocklength,
-lowest is most desirable

Fullest search would also include switch block and long or short block possibilities for both [this] block and the [next] block (and idealy the next) to find the best pair/triplet, but write only [this] block and move on to the [next] as the new [this], (reassessing [old.next] to ensure the ideal stepping of blocksizes over audio details)

Im thinking that could be called HE-Stereo %}
justin1972uk
QUOTE(Gabriel @ Mar 4 2005, 12:41 AM)
Compressing more the S channel (sometimes called side channel starving) is a nice adea for high compression ratio. The question is just how to carefully starve it. Right now Lame doesn't starve side channel by default (although it has the ability do do it).


How would I switch this on?
Pio2001
QUOTE(2Bdecided @ Jun 2 2005, 01:03 PM)
P.S. Most pan potted "stereo" pop music has no real phase difference between the two stereo channels, though phase and timing differences appear at the ears when listening over loudspeakers.
*



I've heard this remark in audiophile circles.
However, I listen to electronic "pop" music since 1982, and among the 2400 electronic tracks in my playlist, I don't think that there is a single one that uses "pan" stereo without phase effects.

QUOTE(stephanV @ Jun 2 2005, 12:49 PM)
The question is if comparing just the S channels makes any sense.
*



Not more than comparing the substraction of the original minus the encoded sample, I think.

QUOTE(3ngel @ Jun 2 2005, 04:12 PM)
- For a super secure paranoid encoding it's better use always L/R stereo
*



Until proof of the contrary, at a given bitrate, Joint Stereo IMPROVES quality, it does not DECRESE it. Listening to the side channel is not a proof, because first as it has been noted, what's audible in the S channel might not be in the L/R version, and secondly, if the S channel is worse, maybe the M one is better. And a better M channel may have more importance than a better S one, since it is closer to the representation of the real music.
In the end, only stereo comparisons are valid.

But what I don't understand in this discussion, is why L/R encoding would do any good to the stereo image. In L/R encoding, psychoacoustics are applied independantly to the left and right channel. If their effect is inaudible on each channel, it might become audible in stereo. For example if the phase is modified in one channel and not in the other. Both phase modification might be judged as inaudible, though their combination in stereo listening could be audible. Using Mid / Side encoding, a part of the psychoacoustics is applied in a similar way to the left and right channel, and the rest is applied such as not harming audible the difference between left and right.
So, without having performed any listening tests, my a priori expectation would be that M/S stereo preserves stereo image, while L/R destroys it, since in the first case, L and R distortions are correlated, while in L/R, they are independant.

We've seen that the side channel is better with L/R than with M/S, but the M/S one have gone through a psychoacoustic model, that guarantees that the distortion is masked in the final result, while the distortion of the L/R version of the S channel is completely independant of the psychoacoustic model, and could then be plainly audible, since it is not masked.
znode
QUOTE(3ngel @ Jun 1 2005, 02:59 PM)
But here goes the "trick" (and faulty of J/S in some situation) : what's happen if the algorithm says that 2 channels are equal (resulting in full MID and 0 SIDE), while they are not *entirely*?
*


Uh. What? The "algorithm" is not perceptual, it's arithmatic. If it says they're equal, they're durn well equal.

QUOTE(3ngel @ Jun 2 2005, 10:44 AM)
The goal of my "secure" and "paranoid" definition, is to blow away completely the lossyness introduced by J/S alghoritm in order to maintain as less "manipulation" as possibile on the signal. So there's no point in "weighing".
*


Uh. What? The whole POINT of lossy is weighing, to put LIMITED BITS to the best use. The only "weighing" argued here is purely to save the bits for better usage, rather than wasting it in making the "manipulation" of the "stereo image" less. In most cases, perfecting the L/R preservation is certainly not the best use of said limited amount of bits.

QUOTE(3ngel @ Jun 2 2005, 10:44 AM)
for complete stereo preservation. On the other side, naturally you can do all weighing you prefer, but you'll not reach (well we can say) "maximum quality" (in a given bitrate context).
*


Uh. What? Of course you will. "Maximum quality" obviously does not depend on stereo preservation.

Heck, you're not even arguing for stereo preservation. You're arguing an arbitrary L/R data preservation, which may or may not be the optimal stereo preservation, depending on the bitrate.

At very low bitrates, bits are much better used cutting down as much phasing and warbling as possible; they'd be noticed much more than stereo shift. And J/S saves the bits better than L/R, with no proven reduce in stereo image preservation except problem cases.

At very high bitrates, J/S would be obviously be transparent to L/R. As phong had explained, J/S and L/R themselves are both equally lossless; it's how the channels are quantised afterwards that makes a difference. A difference that can be either the L/R being inferior or the J/S being inferior.

Maximum quality would be first to weigh the relative advantage of preserving stereo data, and how much to preserve, against preserving other things. Then, one weighs the relative merits of J/S and L/R. I don't see how your "weighing is pointless" ever comes into play in the lossy arena.

QUOTE(3ngel @ Jun 2 2005, 10:44 AM)
Because a perfect stereo preservation can contribute to the "clean sounding" of an mp3 and so to its overall quality.
*


Of course stereo preservation contributes. But HOW MUCH it contributes, weighed against other elements, is the entire point of lossy. Best use of limited bits. See diminishing returns. You might want to try lossless, rather than L/R "PRECISE SUPER SECURE PARANOID HIGH QUALITY" encodes with "cleaner sounding" "stereo preservation".
PatchWorKs
user posted image

Campaign for Real Stereo ! laugh.gif
2Bdecided
QUOTE(Pio2001 @ Jun 2 2005, 11:47 PM)
QUOTE(2Bdecided @ Jun 2 2005, 01:03 PM)
P.S. Most pan potted "stereo" pop music has no real phase difference between the two stereo channels, though phase and timing differences appear at the ears when listening over loudspeakers.
*



I've heard this remark in audiophile circles.
However, I listen to electronic "pop" music since 1982, and among the 2400 electronic tracks in my playlist, I don't think that there is a single one that uses "pan" stereo without phase effects.


Are you sure? There are certainly many tracks that do interesting things with phase for special effects - but generally that doesn't happen because any phase difference between the two channels causes comb filtering when combined to mono. It would sound terrible on a car radio as it switched to mono or blend/mono in weak reception areas.

QUOTE
We've seen that the side channel is better with L/R than with M/S, but the M/S one have gone through a psychoacoustic model, that guarantees that the distortion is masked in the final result, while the distortion of the L/R version of the S channel is completely independant of the psychoacoustic model, and could then be plainly audible, since it is not masked.
*



You can contrive samples where, at least for some encoders, L/R encoding wrecks the S channel. Get a mono signal, and change it in some way so that it's still nearly mono, but there's a very very small difference between the two channels - something inaudible like -80dB of noise. This will be enough to cause the L and R representation to be different; if you subtract R from L to get S, you'll hear lots of coding noise. Starting with the same signal, a sensible M/S encoding will see the original tiny -80dB S, and ignore it, giving silent S.

Neither L/R or M/S encodings ensure that "spatial" details of the original signal are preserved, or that artefacts are avoided, in the final result - like I said in my reply to ChiGung, compared to the complexity of human spatial hearing, it's partly luck that psychoacoustic codecs sound OK in this respect. There's little design to preserve the spatial cues we can hear, and dump the ones we can't - unless Gabriel knows different in lame?

Cheers,
David.
Gabriel
QUOTE
There's little design to preserve the spatial cues we can hear, and dump the ones we can't - unless Gabriel knows different in lame?

In Lame v3, that is right. The only "safety" is that in M/S we try to never allocate 0 bits to the side channel. (totally removing side channel would loose phase info)

Pio is also right regarding the safety level that M/S provides against L/R regarding stereo unmasking.

I think that there are probably more tricks regarding spatial cues and binaural masking/unmasking in Nero AAC encoder (but this is just speculation based on previous posts from Ivan).
3ngel
@Znode
Can you point me to some link in order to demonstrate that M/S conversion from L/R is strictly math (and so lossless)? The only lossless algo would be a strict AND / XOR, that is a bit a bit comparison, but this can't be the case. I think that in order to decide what's equal and what's different is applied a kind of filtering, that is subjected to "treshold error", resulting in possibly lost of data.
But i wait for a valid link in order to understand if you're right smile.gif
Ariakis
Unless I'm mistaken:

M = (L + R) / 2
S = (L - R) / 2

so on reconstruction:

L = M + S
R = M - S
2Bdecided
M=L+R
S=L-R

L=(M+S)/2
R=(M-S)/2

how lossless do you want? wink.gif

You can divide by sqrt(2) at each stage instead, or divide by 2 at the first stage instead of the second - same result either way. Obviously you need one extra bit either LSB or MSB (depending on where you put the divide by 2) to prevent rounding errors / overflow. Given this, and integers at the start, the process is completely lossless.

Cheers,
David.
EDIT: I must be getting slow!
3ngel
QUOTE(2Bdecided @ Jun 3 2005, 10:16 AM)
M=L+R
S=L-R

L=(M+S)/2
R=(M-S)/2

how lossless do you want? wink.gif

You can divide by sqrt(2) at each stage instead, or divide by 2 at the first stage instead of the second - same result either way. Obviously you need one extra bit either LSB or MSB (depending on where you put the divide by 2) to prevent rounding errors / overflow. Given this, and integers at the start, the process is completely lossless.

Well, i don't think it's so simple. If it was so, there were no problems in coding surround 2ch sources in J/S, but as i've read until now, the J/S is not suggested in surrounding context, 'cause it alters the surround (or destroy). If it was lossless (and was all questions of bitrates), there were no problems in coding everything (either surrounding or not), 'cause it would be perfectly reversible.
user
It is well known fact at least got during lame alt-presets development times, when one or another people played with mnsfix values, that js in lossy coding is not perfect for complete reproduction.
So you have the the mnsfix switch with variable, mentioned in this topic before.
Then the --nssafejoint switch, unfortunately it is only contained in preset insane, 320k, and preset standard & extreme cannot be overwritten by it (at least in the original 3.90.x versions, iirc)
In fact, the other presets than insane (which has nssafejoint), use each a certain msfix value, iirc, so there are differences between "js" and "js" by lame default.

Another comparable topic: --ms 15 switch (also --ms 13 etc et.) in MPC, Musepack.
the various quality settings from q5,6,7-9,10 use different --ms [var] switch by default, but all use joint stereo, even quality 10, but --ms 15 (as most advanced joint stereo mode) by default, 7-9 --ms13 and so forth.
It was found out, that eg. quality 5 had some artefacts, eg. during listening via DPL2/Logic7 decoders, which is post-processing, of course, phase differences are important here. Adding --ms 15 (even to q5 or q6) solved the problems with not much higher bitrates (bitrates were always clearly lower than next q-level). Topic can be found here at HA. Additionally the theoretical statement of Klemm, the developer at those days: Adding --ms 15 to q-levels lower than default q10 can only improve sound-quality, in no case make it worse.

edit/addon:

The developers of Lame mp3 and MPC will have had their reasons, why they invented different joint stereo modes in steps. Maybe soembody can explain, how msfix, nssafejoint work exactly, or how --ms xy modes are different.
ChiGung
QUOTE(2Bdecided @ Jun 3 2005, 09:33 AM)
Neither L/R or M/S encodings ensure that "spatial" details of the original signal are preserved, or that artefacts are avoided, in the final result - like I said in my reply to ChiGung, compared to the complexity of human spatial hearing, it's partly luck that psychoacoustic codecs sound OK in this respect.
This is a common enough aspect of technology to be remmembered, the luck of not harming unnacounted aspects, is the reason why right 'sounding' theories often workout even though they are mostly all ultimately insecure -only the excpetions remind us that understanding is never complete eg PCBs, CFCs, BSE..
Lets hope luck holds out for GM organisms and Killer Stranglet scenarios wink.gif

To me the saying that "distortion in S is masked by M" is a conviently lucky right sounding phrase, but as M increases, S is not 'masked', nor does noise in it 'fall in the middle' (these statements paraphrased from general lurkage and a copywritten wiki article. imho M and L and R are similar objects, but S is an object rendered in the same substance from a different domain.
QUOTE
There's little design to preserve the spatial cues we can hear, and dump the ones we can't - unless Gabriel knows different in lame?

My algorithmic suggestion was to not attempt to pyshcoanalyse S then, but to use the notation and reconvert to L/R to check the proposed quantums quality using the more established methods of analysing sound. If there are some checks available to perform on S, why only do them for M/S noted quantums, why not do them for all? Speed, simplicity I guess but it seems overcomplicated to switch perspectives on the psychos just because the notation is different.

regards'
3ngel
@ChiGung
You seem very prepared, so i'll ask you a question (i've already asked) that i think is very important as baseline. The question is :
A trasform from L/R notation to M/S, is trasparent and reverseable bit by bit? In other words, if i take a L/R wav, MD5 it, then convert to M/S wav, then again to L/R and MD5 it, the two wav are exactly the same (like 2*3 = 6*1)? Or there is a kind of "trasformation" from one notation to another? In other word some kind of distortion? I reask this 'cause i've not received a clear response until now, and i think it's importat to know to discuss further.
ChiGung
QUOTE(3ngel @ Jun 3 2005, 03:02 PM)
A trasform from L/R notation to M/S, is trasparent and reverseable bit by bit? In other words, if i take a L/R wav, MD5 it, then convert to M/S wav and MD5 it, the two wav are exactly the same (like 2*3 = 6*1)?

As Ariakis and David described the initial conversion between L/R and M/S is practicaly lossless (just a 1bit rounding error could be involved, which could do with being dithered maybe but not amount to anything in this context)

Showing some ignorance here - I dont know in lame when the conversion is applied. It could be applied before the MDCT domain transformation (before a stretch of level samples is summarised as the states of a range of frequency bands) or it may be applicable to each band after the transform blink.gif

Anyway, although the L/R to M/S is almost as lossless as a=b, the final decoded (acoustic) result of quantisation noise which is introduced afterward by the lossy encoding is different if it is going onto an L/R phrase or M/S phrase.
QUOTE
Or there is a kind of "trasformation" from one notation to another? In other word some kind of distortion?

The distortion is definetily added after the LR MS conversion properly stated earlier.
There shouldnt really be any loss of sound definition associated with smart M/S use.
Im personaly playing chaos's advocate with appearances ; )

edit: 1bit is not significant here (the first few least significant bits contain mostly chaos and rounding patterns anyway - thats one reason why it can be best to dither them after processing)
Jebus
You know this sort of thing comes up all the time, not even over stereo necessarily. The problem is that people have the idea of a "free lunch", ie preserving EVERYTHING EXACTLY the way it was, but at most 1/7th the size.

This is impossible. A lossless encoding is approximately 800kbps. MP3 gives you a MAXIMUM bitrate of 320kbps. 2/3 of stuff has to get thrown out. What shall we throw out? Well, wasted stereo image is one thing, frequency response is another. But we constantly hear stuff like this:

"I use lame --preset insane -m s --lowpass 22000"

-m s and a super-high lowpass are NOT A FREE LUNCH. Bits better spent elsewhere will needlessly be allocated to satisfy your "everything must be preserved" ethos, and as a result - it sounds like shit vs. the regular preset. Only 320kbps are available, people! Let's spend it wisely! joint stereo is (probably even moreso than a proper lowpass setting) a braindead decision - works 99% flawlessly, and saves tons of bitrate for other more important things.
3ngel
@ChiGung
Thanks, you've been very clear finally smile.gif So in the end is 1bit error-lossy. Ok, this is what i want to know smile.gif Now, as someone said (pheraps you?), in the common language L/R = M and the rest = S, this is not exact because M and S are two face of the same medal. In my opinion M and S have 50% and 50% importance, to preserve a trasparent sound. Now the question : what are the weighing in the process of bitrate quantisation of M and S? 50%? Or more towards S (that in the end is the FUNDAMENTAL thing to make of a mono sound a Real Sound)?

@Jebus
People who think about obtaing the EXACTLY quality at the 1/7 of size are simply out of the world smile.gif Moreover here (personally) i'm not saying Joint is bad. I want to know how it does, and what are the REAL trick and defect. At last i could say this : "Well guys, i'm a purist, i don't mind size, and i don't want 1 bit error, so i use 320 L/R. This configuration is the most preserving". And i would be in the true saying this. If it sound shit compared with a joint i don't think, at least i think it can sound equal because each method has its problem. L/R with a possibily noise problem over each separate channel, while M/S with a possibly stereo problem caused by bad quantisation of the S part. In the end i think that does not exist a "perfect configuration", 'cause we are speaking about psyco-acu, and its a so volatile thing you can't imagine. For example the same L/R encode could sound better on some system and worse on others, same with a M/S encode, 'cause cards,ampli,speakers, each one enhance or suppress frequences and in a particular combo could reveal the particular defects of one encoding method or take advantage from it, making it sounding better smile.gif
ChiGung
QUOTE
i don't want 1 bit error, so i use 320 L/R. This configuration is the most preserving"

Its not, forget about that bit, it means less than a dime to a millionaire.
2Bdecided
QUOTE(3ngel @ Jun 3 2005, 10:30 AM)
QUOTE(2Bdecided @ Jun 3 2005, 10:16 AM)
M=L+R
S=L-R

L=(M+S)/2
R=(M-S)/2

how lossless do you want? wink.gif

You can divide by sqrt(2) at each stage instead, or divide by 2 at the first stage instead of the second - same result either way. Obviously you need one extra bit either LSB or MSB (depending on where you put the divide by 2) to prevent rounding errors / overflow. Given this, and integers at the start, the process is completely lossless.

Well, i don't think it's so simple. If it was so, there were no problems in coding surround 2ch sources in J/S, but as i've read until now, the J/S is not suggested in surrounding context, 'cause it alters the surround (or destroy). If it was lossless (and was all questions of bitrates), there were no problems in coding everything (either surrounding or not), 'cause it would be perfectly reversible.
*



You've had the answer - transforming between L/R and M/S is lossless - but lossy encoders don't preserve either form bit-accurate - because they're lossy. The S "channel", in particular, is damaged. Intentionally.

Cheers,
David.
3ngel
QUOTE(ChiGung @ Jun 3 2005, 05:09 PM)
Its not, forget about that bit, it means less than a dime to a millionaire.
*


It was an extreme example, not me smile.gif
I agree that 1 bit is negligible, but you have to admit that from 1bit lossy, and 0bit lossy someone can prefer the second (considering that in the average case they sound the same)smile.gif
ChiGung
I dont have to admit that, its a red herring anyway.

eof
3ngel
QUOTE(ChiGung @ Jun 3 2005, 05:57 PM)
I dont have to admit that, its a red herring anyway.


Woh, i was only joking, don't push yourself. Yes, the problems are others.
tgoose
QUOTE(3ngel @ Jun 3 2005, 04:49 PM)
At last i could say this : "Well guys, i'm a purist, i don't mind size, and i don't want 1 bit error, so i use 320 L/R. This configuration is the most preserving". And i would be in the true saying this. If it sound shit compared with a joint i don't think, at least i think it can sound equal because each method has its problem. L/R with a possibily noise problem over each separate channel, while M/S with a possibly stereo problem caused by bad quantisation of the S part.
*


That doesn't make any sense - LR can only be less preserving; think about this: Let's we're using 320kbps (the figures I'm using will not be correct at all, since I don't have any first hand experience, but they demonstrate the point). Compression to MS will cause a 1 bit error on one channel ~50% of the time (that is, 1/256th difference on 1/4 samples). I think someone said there was a way of removing this error, as well.
Now the original file, in wav, is 1411kbps (I think?). If the MS conversion, let's say we gain 300kbps and so almost losslessly, the file is down to 1100kbps. Thus, the lossy compression bit only has to "get rid" of 800kbps, instead of 1,100kbps using LR encoding. Also using MS switches to LR when that's more efficient, so if at any time there's more benefit in using that, it will be used in a joint stereo compression.
3ngel
QUOTE(tgoose @ Jun 3 2005, 06:49 PM)
That doesn't make any sense - LR can only be less preserving; think about this: Let's we're using 320kbps (the figures I'm using will not be correct at all, since I don't have any first hand experience, but they demonstrate the point). Compression to MS will cause a 1 bit error on one channel ~50% of the time (that is, 1/256th difference on 1/4 samples). I think someone said there was a way of removing this error, as well.
Now the original file, in wav, is 1411kbps (I think?). If the MS conversion, let's say we gain 300kbps and so almost losslessly, the file is down to 1100kbps. Thus, the lossy compression bit only has to "get rid" of 800kbps, instead of 1,100kbps using LR encoding. Also using MS switches to LR when that's more efficient, so if at any time there's more benefit in using that, it will be used in a joint stereo compression.
*


Your post doesn't make any single sense. Re-read and understand better the J/S working smile.gif
ChiGung
If you had the faintest clue about what you are trying to argue about, you could have read some sense in tgoose's post tongue.gif wink.gif yeahright.gif
tgoose
I did rush the post since I had to leave, but I don't think I actually put anything inaccurate in apart from the figures.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.