IPB

Welcome Guest ( Log In | Register )

SoundExpert explained, Methodology issues
Serge Smirnoff
post Nov 24 2010, 13:27
Post #1





Group: Members
Posts: 370
Joined: 14-December 01
Member No.: 641



I found this thread among SoundExpert referals and was a bit surprised with almost complete misunderstanding of SE testing methodology and particularly how diff signal is used in SE audio quality metrics. Discussion on the topic from 2006 actually seems more meaningful. So I decided to post here some SE basics for reference purposes. I will use a thought experiment which is close to reality though.

Suppose we have two sound signals – the main and the side one. They could be for example a short piano passage and some noise. We can prepare several mixes of them in different proportions:
  • equal levels of main and side signals (0dB RMS)
  • half level of side signal (-6dB RMS)
  • quarter level of side signal (-12dB RMS)
  • 1/8 level of side signal (-18dB RMS)
  • 1/16 level of side signal (-24dB RMS)

After normalization all mixes have equal levels and we can evaluate perceptibility of the side signal in the mixes. Here at SE we found that this perceptibility is a monotonous function of side signal level and looks like this:

Figure: Side signal perception

(1) In other words, there is a relationship between objectively measured level of side signal and its subjectively estimated perceptibility in the mix. And what is more:
(a) this relationship is well described by 2-nd order curve (assuming levels are in dB)
(b) the relationship holds for any sound signals whether they are correlated or not, the only differences are position and curvature of the curve.

(2) These side stimulus perceptibility curves are the core of SE rating mechanism. Each device under test has its own curve plotted on basis of SE online listening tests.
(3) Side signals are difference signals of devices being tested. Levels of side signals are expressed in dB of Difference level parameter which is exactly equal to RMS level of side signal in our case.
(4) Subjective grades of perceptibility are anchor points of 5-grade impairment scale.
(5) Audio metrics beyond threshold of audibility is determined by extrapolation of that 2-nd order curves. Virtual grades in extrapolated area could be considered as objective quality parameters regarding human auditory peculiarities.

So, yes, difference signal is used in SE testing. We take into account both its level and how human auditory system perceives it together with reference signal. Some difference signals having fairly high levels still remain almost imperceptible against the background of reference signal and vice versa; perceptibility curves reflect this.

This is the concept. Many parts of it still need thorough verification in carefully designed listening tests, which are beyond SE possibilities. All we can do is to analyze collected grades returned by SE visitors. This will be done for sure and yet this can't be a replacement of properly organized listening tests.

SE testing methodology is new and questionable, but all assumptions look reasonable and SE ratings – promising, at least to me. Time will show.


--------------------
keeping audio clear together - soundexpert.org
Go to the top of the page
+Quote Post
 
Start new topic
Replies
2Bdecided
post Nov 25 2010, 12:30
Post #2


ReplayGain developer


Group: Developer
Posts: 4945
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Just to be clear, your graph example shows grades where the default noise level (0dB) is quite objectionable, and reducing the noise makes it less and less so - correct?

But with codec testing, you do kind of the opposite. The default noise level (0dB) is usually indistinguishable/transparent, or very nearly so, and to build the "worse quality" part of the curve (the part where people can hear the noise), you have to amplify the coding noise - correct?


People in this thread are saying the scale beyond "imperceptible" makes no sense. I'm not sure if that's true or not. What you're "measuring" (I put that in quotes - see later) is how far the coding noise sits below the threshold of audibility. (or above, if it's audible at the default level). If the second-order curve theory holds true, then to do this you only need sufficient points on the curve where the difference is audible. Points on the curve where the difference is inaudible don't help because it does become a flat line there.


There are several accepted ways to judge the threshold of audibility. I used this one...
QUOTE
Each masking threshold was determined by a 3-interval, forced choice task, using a one up two down transformed stair case tracking method. This procedure yields the threshold at which the listener will detect the target 70.7% of the time [Levitt, 1971]. The process is as follows.
For each individual measurement, the subject is played three stimuli, denoted A, B, and C. Two presentations consist of the masker only, whilst the third consists of the masker and tar-get. The order of presentation is randomised, and the subject is required to identify the odd-one-out, thus determining whether A, B, or C contains the target. The subject is required to choose one of the three presentations in order to continue with the test, even if this choice is pure guesswork, hence the title “forced choice task.” If the subject fails to identify the target signal, the amplitude of the target is raised by 1 dB for the next presentation. If the subject cor-rectly identifies the target signal twice in succession, then the amplitude of the target is re-duced by 1 dB for the next presentation. Hence the amplitude of the target should oscillate about the threshold of detection, as shown in Figure 6.5. In practice, mistakes and lucky guesses by the listener typically cause the amplitude of the target to vary over a greater range than that shown. A reversal (denoted by an asterisk in Figure 6.5) indicates the first incorrect identification following a series of successes (upper asterisks), or the first pair of correct identi-fications following a series of failures (lower asterisks). The amplitudes at which these rever-sals occur are averaged to give the final masked threshold. An even number of reversals must be averaged, since an odd number would cause a +ve or –ve bias. Throughout these tests, the final six (out of eight) reversals were averaged to calculate each masked threshold.
The initial amplitude of the target is set such that it should be easily audible. Before the first reversal, whenever the subject correctly identifies the target twice, the amplitude is reduced by 6 dB. After the first reversal, whenever the subject fails to identify the target, the amplitude is increased by 4 dB. After the second reversal, whenever the subject correctly identifies the tar-get twice, the amplitude is reduced by 2 dB. After the third reversal, the amplitude is always changed by 1 dB, and the following six reversals are averaged to calculate each masked threshold. This procedure allows the target amplitude to rapidly approach the masked thresh-old, and then finely track it. If the target amplitude were changed in 1 dB steps initially, then the decent to the masked threshold would take considerably longer, and add greatly to listener fatigue. In the case where the listener fails to identify the target initially, then the target ampli-tude is increased by 6 dB for each failed identification, up to the maximum allowed by the re-play system (90 dB peak SPL at the listener’s head).

This is normally used for simple noise masking tone experiments. It seems to work OK with coding noise, but repetition of a moment of coded audio over and over again is quite mind numbing and makes people listen in a very different way to normal music listening. Whether it pushes their thresholds up or down I don't know. Quite a fascinating subject IMO!


It seems to me that your method is far kinder to listeners. If your second order curve fitting can be justified, then it's a really neat way of finding the threshold of audbility (the cross over from 5.0 "imperceptible", to 4.9 "just perceptible but not annoying" on the usual scale) without even having to test at that (difficult) level.



So far so good. What I'm less convinced of is the implication that a given codec has so much "headroom", and that this is a "good thing".

e.g. on the range of content tested, at a given bitrate/setting, a given codec might be transparent even with the noise elevated by 12dB. It scores well in your test. Fair enough. IMO it would be wrong to draw too much from this conclusion. e.g.
1. It's tempting to think this means it's suitable for transcoding, but it might not be - it might fall apart when transcoded.
2. It's tempting to think this means that audible artefacts will be rarer (and/or less bad) with this codec than with one where the noise becomes audible when elevated by 3dB, but this might be very wrong - this wonderful codec which keeps coding noise 12dB below the threshold of audibility on the content tested might fall apart horribly on some piece of content that hasn't been tested.


I'm sure you know all this! I'm just thinking aloud.

Anyway, I find it fascinating. Thanks for the explanation.

Cheers,
David.
Go to the top of the page
+Quote Post

Posts in this topic
- Serge Smirnoff   SoundExpert explained   Nov 24 2010, 13:27
- - drewfx   What is the justification for the "dashed...   Nov 24 2010, 18:20
|- - Serge Smirnoff   QUOTE (drewfx @ Nov 24 2010, 21:20) What ...   Nov 24 2010, 20:00
||- - drewfx   QUOTE (Serge Smirnoff @ Nov 24 2010, 14:0...   Nov 24 2010, 20:24
||- - Serge Smirnoff   QUOTE (drewfx @ Nov 24 2010, 23:24) Exact...   Nov 24 2010, 21:49
|- - Porcus   QUOTE (drewfx @ Nov 24 2010, 18:20) What ...   Nov 27 2010, 15:49
|- - drewfx   QUOTE (Porcus @ Nov 27 2010, 09:49) QUOTE...   Nov 29 2010, 18:43
|- - greynol   QUOTE (drewfx @ Nov 29 2010, 09:43) And t...   Nov 29 2010, 19:18
|- - Serge Smirnoff   QUOTE (greynol @ Nov 29 2010, 22:18) Some...   Nov 29 2010, 20:21
- - drewfx   Just to be clear - I am not necessarily questionin...   Nov 24 2010, 22:17
|- - Serge Smirnoff   If you want to build human-hearing-oriented audio ...   Nov 25 2010, 00:24
||- - alexeysp   QUOTE (Serge Smirnoff @ Nov 25 2010, 01:2...   Nov 25 2010, 11:35
||- - Serge Smirnoff   QUOTE (alexeysp @ Nov 25 2010, 13:35) ...   Nov 25 2010, 19:33
|- - knutinh   QUOTE (drewfx @ Nov 24 2010, 22:17) I rep...   Nov 25 2010, 19:15
|- - Serge Smirnoff   QUOTE (knutinh @ Nov 25 2010, 21:15) If t...   Nov 25 2010, 19:49
|- - Kees de Visser   In the recently closed thread which the OP referre...   Nov 25 2010, 21:39
- - 2Bdecided   Just to be clear, your graph example shows grades ...   Nov 25 2010, 12:30
|- - Serge Smirnoff   QUOTE (2Bdecided @ Nov 25 2010, 14:30) Ju...   Nov 25 2010, 23:50
- - Woodinville   QUOTE (Serge Smirnoff @ Nov 24 2010, 04:2...   Nov 26 2010, 08:25
|- - Serge Smirnoff   QUOTE (Woodinville @ Nov 26 2010, 10:25) ...   Nov 26 2010, 16:25
|- - Woodinville   QUOTE (Serge Smirnoff @ Nov 26 2010, 07:2...   Nov 27 2010, 07:17
|- - Serge Smirnoff   QUOTE (Woodinville @ Nov 27 2010, 09:17) ...   Nov 27 2010, 08:29
|- - Woodinville   QUOTE (Serge Smirnoff @ Nov 26 2010, 23:2...   Nov 27 2010, 23:05
|- - knutinh   QUOTE (Woodinville @ Nov 27 2010, 23:05) ...   Nov 28 2010, 19:24
- - greynol   That's a mighty big if. For years people have...   Nov 28 2010, 20:14
|- - Kees de Visser   The technique isn't new, according to this AES...   Nov 28 2010, 21:35
||- - Serge Smirnoff   QUOTE (Kees de Visser @ Nov 29 2010, 00:3...   Nov 28 2010, 22:47
|- - 2Bdecided   QUOTE (greynol @ Nov 28 2010, 19:14) That...   Nov 29 2010, 11:49
|- - Porcus   QUOTE (2Bdecided @ Nov 29 2010, 11:49) I ...   Nov 29 2010, 13:00
|- - 2Bdecided   QUOTE (Porcus @ Nov 29 2010, 12:00) QUOTE...   Nov 29 2010, 16:27
|- - Porcus   [Heavily edited] QUOTE (2Bdecided @ Nov 29 2...   Nov 29 2010, 16:47
|- - knutinh   QUOTE (2Bdecided @ Nov 29 2010, 16:27) QU...   Nov 30 2010, 09:53
|- - Porcus   QUOTE (knutinh @ Nov 30 2010, 09:53) Why ...   Nov 30 2010, 11:28
|- - knutinh   QUOTE (Porcus @ Nov 30 2010, 11:28) QUOTE...   Nov 30 2010, 11:34
- - greynol   If we aren't going to consider real-world usag...   Nov 29 2010, 20:27
|- - Serge Smirnoff   QUOTE (greynol @ Nov 29 2010, 23:27) What...   Nov 29 2010, 20:36
- - greynol   Breaking masking by amplifying a difference signal...   Nov 29 2010, 20:45
|- - Serge Smirnoff   QUOTE (greynol @ Nov 29 2010, 23:45) Brea...   Nov 29 2010, 21:19
|- - Kees de Visser   QUOTE (greynol @ Nov 29 2010, 21:45) Brea...   Nov 29 2010, 23:21
|- - greynol   QUOTE (Kees de Visser @ Nov 29 2010, 14:2...   Nov 30 2010, 08:19
- - greynol   How so?   Nov 29 2010, 21:31
|- - Serge Smirnoff   QUOTE (greynol @ Nov 30 2010, 00:31) How ...   Nov 29 2010, 22:10
- - SebastianG   QUOTE (Serge Smirnoff @ Nov 24 2010, 13:2...   Nov 29 2010, 22:04
- - Woodinville   Using a difference signal as a signal-detection te...   Nov 29 2010, 22:14
|- - Porcus   QUOTE (Woodinville @ Nov 29 2010, 22:14) ...   Nov 29 2010, 23:00
||- - Woodinville   QUOTE (Porcus @ Nov 29 2010, 14:00) QUOTE...   Nov 30 2010, 00:26
|- - Serge Smirnoff   QUOTE (Woodinville @ Nov 30 2010, 01:14) ...   Nov 30 2010, 09:20
- - Serge Smirnoff   QUOTE (SebastianG @ Nov 30 2010, 01:04) I...   Nov 30 2010, 09:09
|- - 2Bdecided   QUOTE (Serge Smirnoff @ Nov 30 2010, 08:0...   Nov 30 2010, 16:24
|- - Serge Smirnoff   QUOTE (2Bdecided @ Nov 30 2010, 19:24) Ho...   Nov 30 2010, 17:38
|- - Woodinville   QUOTE (Serge Smirnoff @ Nov 30 2010, 08:3...   Dec 1 2010, 03:11
|- - Serge Smirnoff   QUOTE (Woodinville @ Dec 1 2010, 06:11) Q...   Dec 1 2010, 09:17
|- - Woodinville   QUOTE (Serge Smirnoff @ Dec 1 2010, 00:17...   Dec 1 2010, 22:03
|- - Kees de Visser   QUOTE (Woodinville @ Dec 1 2010, 23:03) T...   Dec 1 2010, 23:47
||- - Woodinville   QUOTE (Kees de Visser @ Dec 1 2010, 14:47...   Dec 1 2010, 23:55
||- - greynol   QUOTE (Woodinville @ Dec 1 2010, 14:55) s...   Dec 2 2010, 06:47
||- - Serge Smirnoff   QUOTE (Woodinville @ Dec 2 2010, 02:55) T...   Dec 2 2010, 08:53
||- - Kees de Visser   QUOTE (Woodinville @ Dec 2 2010, 00:55) T...   Dec 2 2010, 09:35
||- - greynol   QUOTE (Kees de Visser @ Dec 2 2010, 00:35...   Dec 2 2010, 10:34
||- - 2Bdecided   QUOTE (Kees de Visser @ Dec 2 2010, 08:35...   Dec 2 2010, 11:25
|||- - Kees de Visser   QUOTE (2Bdecided @ Dec 2 2010, 12:25) Com...   Dec 2 2010, 13:09
||||- - 2Bdecided   QUOTE (Kees de Visser @ Dec 2 2010, 12:09...   Dec 2 2010, 16:04
|||||- - Kees de Visser   QUOTE (2Bdecided @ Dec 2 2010, 17:04) QUO...   Dec 2 2010, 17:52
|||||- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 2 2010, 19:04) Now...   Dec 2 2010, 19:24
||||- - greynol   QUOTE (Kees de Visser @ Dec 2 2010, 04:09...   Dec 2 2010, 19:15
|||- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 2 2010, 14:25) Com...   Dec 2 2010, 13:10
||- - Woodinville   QUOTE (Kees de Visser @ Dec 2 2010, 00:35...   Dec 3 2010, 00:32
|- - Serge Smirnoff   QUOTE (Woodinville @ Dec 2 2010, 01:03) S...   Dec 2 2010, 09:01
- - Porcus   Joking aside: I'd be surprised if MPEG didn...   Nov 30 2010, 12:03
- - 2Bdecided   I can see how this could work for a simple low pas...   Dec 1 2010, 16:26
- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 1 2010, 19:26) Wit...   Dec 2 2010, 09:41
- - 2Bdecided   QUOTE (Serge Smirnoff @ Dec 2 2010, 08:41...   Dec 2 2010, 11:32
- - Serge Smirnoff   QUOTE (2Bdecided @ Dec 2 2010, 14:32) If ...   Dec 2 2010, 12:18


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 16th April 2014 - 21:10