Someone has just suggested an interesting approach to ABX testing, which they believed was standard ( and they should know! ). It's something that I haven't heard of before.
Here at HA, we usually do this:
A=Original sample ( O )
B=Coded sample ( C )
X=random choice of the above ( A or B = O or C )
So you'll get three clips in each trial, either ABA or ABB ( OCO or OCC )
This means you don't really need to listen to the first and second clips if the difference is obvious - just listen to the third clip, usually labelled X, and say what it is.
(There is also the version where you have X and Y, meaning you have ABAB or ABBA - more to listen to, but statistically identical (?).)
However, someone has just suggested the following to me:
For each trial, decide whether
A=original sample ( O )
B=coded sample ( C )
or
A=coded sample ( C )
B=original sample ( O )
Then have X=random choice of A or B ( O or C ).
This means you have four possibilities: OCO, OCC, COO, COC.
(Or, to put it another way, if A=O and B=C, then it's ABA, ABB, BAA, BAB)
This means that, for each trial, you must listen to the first and second audio samples, then decide whether the third audio sample is the same as the first or the second.
This means you can't so easily carry an idea of what the coded version's faults are in your head, because you don't know which is the coded version - but then, it doesn't matter which the coded version is, because ( like ABX ) all you need to do is match a pair to win!
I was told this is called "ABX", but I think is need a different name, maybe rABX to indicate A and B are also changed, or XYZ?
Whatever you call it, I'd imagine this kind of testing is harder for people who are very familiar with "our kind" of ABX.
Mad fool that I am, I'm interested to know the relative sensitivity of ABX, ABXY, rABX, rABXY, and simple ABC pick the odd one out ( used in most psychoacoustic tests ).
The statistics for "really detecting a difference to 95% confidence" should be the same, but unless you have perfect concentration, I suspect the chances of passing a test with a given artefact are slightly different depending on the test methodology.
Is this measurable? Does it matter? Do you think they'll be any difference in practice?
Cheers,
David.
P.S. The rABX approach was suggested as more indicative of real world listening. There's a chance that it might be less sensitive ( because you can’t be sure that B contains any possible artefact, while A does not, so you will be less certain going into the test ), but also there's a chance it may be more sensitive: in situations where the switching is out of your control, so you get A,B,X at predefined intervals, A is always preceded by you giving your previous response, then complete silence, while B is always preceded by A then some silence. If the simple fact that A is always first makes it sound different enough from B that you miss the ( real but tiny ) difference in B, swapping the two around may allow you to focus on the real difference, rather than the imagined one.
