SoundExpert explained, Methodology issues |
SoundExpert explained, Methodology issues |
Nov 24 2010, 13:27
Post
#1
|
|
![]() Group: Members Posts: 325 Joined: 14-December 01 Member No.: 641 |
I found this thread among SoundExpert referals and was a bit surprised with almost complete misunderstanding of SE testing methodology and particularly how diff signal is used in SE audio quality metrics. Discussion on the topic from 2006 actually seems more meaningful. So I decided to post here some SE basics for reference purposes. I will use a thought experiment which is close to reality though.
Suppose we have two sound signals – the main and the side one. They could be for example a short piano passage and some noise. We can prepare several mixes of them in different proportions:
After normalization all mixes have equal levels and we can evaluate perceptibility of the side signal in the mixes. Here at SE we found that this perceptibility is a monotonous function of side signal level and looks like this: Figure: Side signal perception (1) In other words, there is a relationship between objectively measured level of side signal and its subjectively estimated perceptibility in the mix. And what is more: (a) this relationship is well described by 2-nd order curve (assuming levels are in dB) (2) These side stimulus perceptibility curves are the core of SE rating mechanism. Each device under test has its own curve plotted on basis of SE online listening tests. (3) Side signals are difference signals of devices being tested. Levels of side signals are expressed in dB of Difference level parameter which is exactly equal to RMS level of side signal in our case. (4) Subjective grades of perceptibility are anchor points of 5-grade impairment scale. (5) Audio metrics beyond threshold of audibility is determined by extrapolation of that 2-nd order curves. Virtual grades in extrapolated area could be considered as objective quality parameters regarding human auditory peculiarities. So, yes, difference signal is used in SE testing. We take into account both its level and how human auditory system perceives it together with reference signal. Some difference signals having fairly high levels still remain almost imperceptible against the background of reference signal and vice versa; perceptibility curves reflect this. This is the concept. Many parts of it still need thorough verification in carefully designed listening tests, which are beyond SE possibilities. All we can do is to analyze collected grades returned by SE visitors. This will be done for sure and yet this can't be a replacement of properly organized listening tests. SE testing methodology is new and questionable, but all assumptions look reasonable and SE ratings – promising, at least to me. Time will show. -------------------- keeping audio clear together - soundexpert.org
|
|
|
|
![]() |
Nov 30 2010, 09:09
Post
#2
|
|
![]() Group: Members Posts: 325 Joined: 14-December 01 Member No.: 641 |
It's not hard to imagine the possibility of signal pairs (main,side) where you can't hear any difference between main and main+side but you can easily hear a difference between main and main+0.5*side. In practice - never. In all cases perception of gradually unmasked artifacts is monotonous function. That was also confirmed by B. Feiten in already mentioned "Measuring the Coding Margin of Perceptual Codecs with the Difference Signal" (AES Preprint # 4417). This is the main point of SE metric that was stated in the first post (above the graph). Once again - not a single case where the curve was not monotonous and numerous cases of monotonous behavior. So I treat this as a fact. Hint: phase is a bitch. ;-) Your implicit assumption is that both signals are independent. But this is not necessarily the case with perceptual audio coders. Take for example the MPEG4 tool called PNS (perceptual noise substitution). It just replaces some high frequency noise with synthetically generated noise of the same level. This is done by transmitting the noise level only. Obviously, we can use this tool in cases when the main perceptual feature is the energy level and anything else is not important. Then, we have the following properties: Noise level of original matches the noise level of the encoded result, so energy(main) = energy(main+side). Probability theory tells us that main and main+side are orthogonal. This implies a coherence between main and side of 0.7 -- ZERO POINT SEVEN. Hardly independent. This also implies that a 50/50 mix -- main+0.5*side -- would lose 3dB power. You can easily compute this via CODE main = [1 0]; side = [0 1] - main; 20*log10(norm(main+0.5*side)) (Matlab code) So, by attenuating the sample-by-sample difference we actually amplify the perceived difference (since we lose power) in this case! What does that tell us? It tells us that you overrate sample-by-sample differences. Perceptual audio coders try to retain certain things so it sounds similar and tolerate other losses. And you're focussing on the "other losses" (as well). What you're doing is basically violating some of a perceptual encoder's principles (like keeping energy levels similar no matter how large the sample-by-sample difference will be). By amplifing the difference you could destroy some signal properties the encoder and our HAS cares about much more than you do. Sound perception is not as simple as you want us to believe. Sample-by-Sample differences are not important. And "extrapolating artefacts" this way is nothing but a big waste of time. Even testing with "attenuated artefacts" doesn't tell you anything. Your methodology breaks down because you're assuming that the difference is independent from the original. It is not. I didn't make such assumption, quite the opposite - see 1b in the first post. Nevertheless, the case you discribe is realy interesting. If exaggerated and simplified a bit it will look like following: We have a sound excerpt which has a time interval (between tonal parts) which consists purely of, say, white noise. Also we have a coder which can only substitute the noise with uncorrelated one whenever it detects that there are no tonal parts during that interval. Then diff. signal will consist of amplified noise portion (being uncorrelated they will be added not subtracted). So the version of our excerpt with amplified differences will have stronger noise part which can be detected in listening tests while in practice this is not important for HAS. Is this the case you wanted to produce? If yes I will examine it more carefully. It is really interesting as it helps to determine the limits of the metric. -------------------- keeping audio clear together - soundexpert.org
|
|
|
|
Nov 30 2010, 16:24
Post
#3
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
In all cases perception of gradually unmasked artifacts is monotonous function. How can you say this when SebG and Woodinville both gave you examples to the contrary?I hit the exact problem Woodinville describes using the method I posted on the first page of this thread - a listener gets stuck in a "false" minima of audibility because double the difference gives you the original signal back (with the part "removed" by the codec being inverted, but that difference is not usually audible). Hardly monotonic - the chance of hearing the artefact becomes zero at a single gain setting (+6dB), and (with the specific audio I used - YMMV!) leaps back to the "expected" function very quickly either side of that. Cheers, David. |
|
|
|
Nov 30 2010, 17:38
Post
#4
|
|
![]() Group: Members Posts: 325 Joined: 14-December 01 Member No.: 641 |
How can you say this when SebG and Woodinville both gave you examples to the contrary? I hit the exact problem Woodinville describes using the method I posted on the first page of this thread - a listener gets stuck in a "false" minima of audibility because double the difference gives you the original signal back (with the part "removed" by the codec being inverted, but that difference is not usually audible). Hardly monotonic - the chance of hearing the artefact becomes zero at a single gain setting (+6dB), and (with the specific audio I used - YMMV!) leaps back to the "expected" function very quickly either side of that. In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it. -------------------- keeping audio clear together - soundexpert.org
|
|
|
|
Dec 1 2010, 03:11
Post
#5
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
How can you say this when SebG and Woodinville both gave you examples to the contrary? I hit the exact problem Woodinville describes using the method I posted on the first page of this thread - a listener gets stuck in a "false" minima of audibility because double the difference gives you the original signal back (with the part "removed" by the codec being inverted, but that difference is not usually audible). Hardly monotonic - the chance of hearing the artefact becomes zero at a single gain setting (+6dB), and (with the specific audio I used - YMMV!) leaps back to the "expected" function very quickly either side of that. In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it. How do you know what "it" is? You have to work specifically to every bit rate, every bandwidth, every sampling rate, every different encoder? This is not useful. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Dec 1 2010, 09:17
Post
#6
|
|
![]() Group: Members Posts: 325 Joined: 14-December 01 Member No.: 641 |
In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it. How do you know what "it" is? You have to work specifically to every bit rate, every bandwidth, every sampling rate, every different encoder? This is not useful. Subtracting a portion of reference signal from output one it's not hard to figure out what frequencies are "ghosted' and remove them with FIR filter. So, yes, we do it for every test sample with amplified artifacts. This helps to get smoother perception curves. Every item tested at SE has its own unique curve plotted on results of SE listening tests. Extrapolating that curve we get resulting quality rating for each testing item. -------------------- keeping audio clear together - soundexpert.org
|
|
|
|
Dec 1 2010, 22:03
Post
#7
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it. How do you know what "it" is? You have to work specifically to every bit rate, every bandwidth, every sampling rate, every different encoder? This is not useful. Subtracting a portion of reference signal from output one it's not hard to figure out what frequencies are "ghosted' and remove them with FIR filter. So, yes, we do it for every test sample with amplified artifacts. This helps to get smoother perception curves. Every item tested at SE has its own unique curve plotted on results of SE listening tests. Extrapolating that curve we get resulting quality rating for each testing item. So, it's "by clip". This still seems useless. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Dec 1 2010, 23:47
Post
#8
|
|
![]() Group: Members Posts: 552 Joined: 22-May 05 From: France Member No.: 22220 |
|
|
|
|
Dec 1 2010, 23:55
Post
#9
|
|
![]() Group: Members Posts: 1355 Joined: 9-January 05 From: JJ's office. Member No.: 18957 |
This still seems useless. So which options are available to reveal sub-threshold differences in a listening test ?This leads to a very simple question: What does "sub-threshold differences in a listening test" mean? Therein lies, perhaps, the underlying philosophical problem here. -------------------- -----
J. D. (jj) Johnston |
|
|
|
Dec 2 2010, 09:35
Post
#10
|
|
![]() Group: Members Posts: 552 Joined: 22-May 05 From: France Member No.: 22220 |
This leads to a very simple question: What does "sub-threshold differences in a listening test" mean? Differences that can be proven to exist with technical means, but are undetectable with a standard listening test.Let me try this analogy: Someone has to leave the next day on a 6-month boat trip. He has to prepare canned food and can choose between two unlabeled lots that look identical. Someone told him that the lots have different "best before" dates: one expires in 1 month, the other in 10 months. He tastes a bit from each, but both taste absolutely identical. He knows that best before dates don't mean that the food will be bad the day after, but his chances to survive the trip are probably bigger when he picks the fresher one. (btw, the boat is too small to take both) |
|
|
|
Dec 2 2010, 11:25
Post
#11
|
|
![]() ReplayGain developer Group: Developer Posts: 4589 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
This leads to a very simple question: What does "sub-threshold differences in a listening test" mean? Differences that can be proven to exist with technical means, but are undetectable with a standard listening test.Let me try this analogy: Someone has to leave the next day on a 6-month boat trip. He has to prepare canned food and can choose between two unlabeled lots that look identical. Someone told him that the lots have different "best before" dates: one expires in 1 month, the other in 10 months. He tastes a bit from each, but both taste absolutely identical. He knows that best before dates don't mean that the food will be bad the day after, but his chances to survive the trip are probably bigger when he picks the fresher one. (btw, the boat is too small to take both) Comparing codecs isn't like this at all. Comparing codecs is an apples to oranges comparison - you don't know that artefacts 6dB below threshold are better than artefacts 5dB below threshold - 1) because the characteristic of the artefacts could be different, and 2) you haven't said what "better" means. Better for what? Not for just listening (either is fine), so for what? Cheers, David. |
|
|
|
Dec 2 2010, 13:09
Post
#12
|
|
![]() Group: Members Posts: 552 Joined: 22-May 05 From: France Member No.: 22220 |
Comparing codecs isn't like this at all. Comparing codecs is an apples to oranges comparison - you don't know that artefacts 6dB below threshold are better than artefacts 5dB below threshold - 1) because the characteristic of the artefacts could be different, and 2) you haven't said what "better" means. Better for what? Not for just listening (either is fine), so for what? Do we agree that there are 3 types of quality levels, from better to worse:1- artefacts are non-existent (-inf), like in lossless coding 2- artefacts are below the hearing threshold 3- artefacts are audible, by at least one listener for at least one (killer)sample In my view the better codec is the one that will remain in category 2 in any situation (e.g. inserting an Orban in the monitoring chain). Example: original master is 24/96. Two lossy copies are made, one 16/44.1 and one mp3 320kbs. Both sound identical to the master. I would say the 16/44.1 is better than the mp3, but if you can give arguments for the contrary, I'm all ear. If I sing the same thing twice, what do you do to these two files to present them on SoundExpert.com? SoundExpert won't work for this, nor will ABX since there's a huge risk for false positives. A lot depends on where you switch from A to B. Small tempo and pitch differences will remain unnoticed when heard in isolation, but as soon as you jump from one to the other they can become apparent. This is the daily job of an audio editor, to find the best spot to inaudibly switch from one take to another. (hint: it's not always easy and I'm glad to be paid per hour) |
|
|
|
Serge Smirnoff SoundExpert explained Nov 24 2010, 13:27
drewfx What is the justification for the "dashed... Nov 24 2010, 18:20
Serge Smirnoff QUOTE (drewfx @ Nov 24 2010, 21:20) What ... Nov 24 2010, 20:00

drewfx QUOTE (Serge Smirnoff @ Nov 24 2010, 14:0... Nov 24 2010, 20:24

Serge Smirnoff QUOTE (drewfx @ Nov 24 2010, 23:24) Exact... Nov 24 2010, 21:49
Porcus QUOTE (drewfx @ Nov 24 2010, 18:20) What ... Nov 27 2010, 15:49
drewfx QUOTE (Porcus @ Nov 27 2010, 09:49) QUOTE... Nov 29 2010, 18:43
greynol QUOTE (drewfx @ Nov 29 2010, 09:43) And t... Nov 29 2010, 19:18
Serge Smirnoff QUOTE (greynol @ Nov 29 2010, 22:18) Some... Nov 29 2010, 20:21
drewfx Just to be clear - I am not necessarily questionin... Nov 24 2010, 22:17
Serge Smirnoff If you want to build human-hearing-oriented audio ... Nov 25 2010, 00:24

alexeysp QUOTE (Serge Smirnoff @ Nov 25 2010, 01:2... Nov 25 2010, 11:35

Serge Smirnoff QUOTE (alexeysp @ Nov 25 2010, 13:35) ... Nov 25 2010, 19:33
knutinh QUOTE (drewfx @ Nov 24 2010, 22:17) I rep... Nov 25 2010, 19:15
Serge Smirnoff QUOTE (knutinh @ Nov 25 2010, 21:15) If t... Nov 25 2010, 19:49
Kees de Visser In the recently closed thread which the OP referre... Nov 25 2010, 21:39
2Bdecided Just to be clear, your graph example shows grades ... Nov 25 2010, 12:30
Serge Smirnoff QUOTE (2Bdecided @ Nov 25 2010, 14:30) Ju... Nov 25 2010, 23:50
Woodinville QUOTE (Serge Smirnoff @ Nov 24 2010, 04:2... Nov 26 2010, 08:25
Serge Smirnoff QUOTE (Woodinville @ Nov 26 2010, 10:25) ... Nov 26 2010, 16:25
Woodinville QUOTE (Serge Smirnoff @ Nov 26 2010, 07:2... Nov 27 2010, 07:17
Serge Smirnoff QUOTE (Woodinville @ Nov 27 2010, 09:17) ... Nov 27 2010, 08:29
Woodinville QUOTE (Serge Smirnoff @ Nov 26 2010, 23:2... Nov 27 2010, 23:05
knutinh QUOTE (Woodinville @ Nov 27 2010, 23:05) ... Nov 28 2010, 19:24
greynol That's a mighty big if.
For years people have... Nov 28 2010, 20:14
Kees de Visser The technique isn't new, according to this AES... Nov 28 2010, 21:35

Serge Smirnoff QUOTE (Kees de Visser @ Nov 29 2010, 00:3... Nov 28 2010, 22:47
2Bdecided QUOTE (greynol @ Nov 28 2010, 19:14) That... Nov 29 2010, 11:49
Porcus QUOTE (2Bdecided @ Nov 29 2010, 11:49) I ... Nov 29 2010, 13:00
2Bdecided QUOTE (Porcus @ Nov 29 2010, 12:00) QUOTE... Nov 29 2010, 16:27
Porcus [Heavily edited]
QUOTE (2Bdecided @ Nov 29 2... Nov 29 2010, 16:47
knutinh QUOTE (2Bdecided @ Nov 29 2010, 16:27) QU... Nov 30 2010, 09:53
Porcus QUOTE (knutinh @ Nov 30 2010, 09:53) Why ... Nov 30 2010, 11:28
knutinh QUOTE (Porcus @ Nov 30 2010, 11:28) QUOTE... Nov 30 2010, 11:34
greynol If we aren't going to consider real-world usag... Nov 29 2010, 20:27
Serge Smirnoff QUOTE (greynol @ Nov 29 2010, 23:27) What... Nov 29 2010, 20:36
greynol Breaking masking by amplifying a difference signal... Nov 29 2010, 20:45
Serge Smirnoff QUOTE (greynol @ Nov 29 2010, 23:45) Brea... Nov 29 2010, 21:19
Kees de Visser QUOTE (greynol @ Nov 29 2010, 21:45) Brea... Nov 29 2010, 23:21
greynol QUOTE (Kees de Visser @ Nov 29 2010, 14:2... Nov 30 2010, 08:19
greynol How so? Nov 29 2010, 21:31
Serge Smirnoff QUOTE (greynol @ Nov 30 2010, 00:31) How ... Nov 29 2010, 22:10
SebastianG QUOTE (Serge Smirnoff @ Nov 24 2010, 13:2... Nov 29 2010, 22:04
Woodinville Using a difference signal as a signal-detection te... Nov 29 2010, 22:14
Porcus QUOTE (Woodinville @ Nov 29 2010, 22:14) ... Nov 29 2010, 23:00

Woodinville QUOTE (Porcus @ Nov 29 2010, 14:00) QUOTE... Nov 30 2010, 00:26
Serge Smirnoff QUOTE (Woodinville @ Nov 30 2010, 01:14) ... Nov 30 2010, 09:20

greynol QUOTE (Woodinville @ Dec 1 2010, 14:55) s... Dec 2 2010, 06:47

Serge Smirnoff QUOTE (Woodinville @ Dec 2 2010, 02:55) T... Dec 2 2010, 08:53

greynol QUOTE (Kees de Visser @ Dec 2 2010, 00:35... Dec 2 2010, 10:34



2Bdecided QUOTE (Kees de Visser @ Dec 2 2010, 12:09... Dec 2 2010, 16:04




Kees de Visser QUOTE (2Bdecided @ Dec 2 2010, 17:04) QUO... Dec 2 2010, 17:52




Serge Smirnoff QUOTE (2Bdecided @ Dec 2 2010, 19:04) Now... Dec 2 2010, 19:24



greynol QUOTE (Kees de Visser @ Dec 2 2010, 04:09... Dec 2 2010, 19:15


Serge Smirnoff QUOTE (2Bdecided @ Dec 2 2010, 14:25) Com... Dec 2 2010, 13:10

Woodinville QUOTE (Kees de Visser @ Dec 2 2010, 00:35... Dec 3 2010, 00:32
Serge Smirnoff QUOTE (Woodinville @ Dec 2 2010, 01:03) S... Dec 2 2010, 09:01
Porcus Joking aside: I'd be surprised if MPEG didn... Nov 30 2010, 12:03
2Bdecided I can see how this could work for a simple low pas... Dec 1 2010, 16:26
Serge Smirnoff QUOTE (2Bdecided @ Dec 1 2010, 19:26) Wit... Dec 2 2010, 09:41
2Bdecided QUOTE (Serge Smirnoff @ Dec 2 2010, 08:41... Dec 2 2010, 11:32
Serge Smirnoff QUOTE (2Bdecided @ Dec 2 2010, 14:32) If ... Dec 2 2010, 12:18![]() ![]() |
|
Lo-Fi Version | Time is now: 25th May 2013 - 01:50 |